INTELLIGENCE 13, 349-359 (1989)
Correlations of Mental Tests with Each Other and with Cognitive Variables are
Highest for Low IQ Groups DOUGLAS K. DETTERMAN
Case Western Reserve University
MARK H. DANIEL The Psychological Corporation
Two studies showed an inverse relationship between ability level and correlations among IQ measures. Low IQ subjects showed much higher correlations than high 1Q subjects. Intercorrelatio||s of IQ subtests, correlations of cognitive ability measures with each other, and correlations of IQ with measures of cognitive abilities all displayed the same effect. In the fast study, data from two experiments in which subjects took a battery of basic cognitive tasks and a standard IQ test were analyzed. Measures from the basic tasks correlated more highly in the low IQ group than in the high IQ group. In the second study, data from the WAIS-R and WlSC-R standardization samples were divided into five ability groups. Average correlations among subtests were computed for each ability group. For both the WAIS-R and WISC-R, average subtest correlations were highest in the low ability group. Correlations declined systematically with increasing IQ. In both studies, correlations were found to be two times higher in low IQ groups than in high IQ groups.
Spearman (1904) established the importance of positive manifold. Positive manifold refers to the empirical observation that tests of mental ability are positively correlated with each other. Spearman's formulation of 'g', general intelligence, represented the degree of positive manifold among all tests in a battery of tests. Positive manifold among mental tests is one of the most reliable, replicable, and important empirical discoveries about human ability yet found. Attempts to explain positive manifold have, directly or indirectly, occupied the efforts of many researchers in psychometrics and individual differences. During the 85year history of this work, it was thought that positive manifold was uniformly distributed over the full range of ability. That is, it was assumed that the correlaPhilip Vernon served as action editor for this manuscript. The WISC-R and WAIS-R analyses were conducted on the standardization data for those tests with the permission of The Psychological Corporation and were carried out partly by The Psychological Corporation. Parts of this work were supported by Grants NO. HD07176 and HD15516 from the National Institute of Child Health and Human Development, Office of Mental Retardation, the Air Force Office of Scientific Research, and the Brooks Air Force Base Human Resources Laboratory, Project Lamp. Correspondence and requests for reprints may be sent to D.K. Detterman, Department of Psychology, Case Western Reserve University, Cleveland, OH 44106.
349
350
DETTERMANAND DANIEL
tion among mental tests would be about the same in a group of low IQ subjects as it would be in a group of high IQ subjects. (Both groups must represent similar ranges of ability and, so, have equal standard deviations, or differences in correlations could be due to restriction of range.) Data are reported that show the uniform distribution assumption is incorrect. The correlations among the WAIS-R and WISC-R subtests and the correlation of basic cognitive measures with standardized tests of intelligence and with each other were analyzed. The data to be reported show correlations among mental tests are higher for low IQ subjects than for high IQ subjects. This relation was first found (Study 1) for the correlation of measures from basic cognitive tasks with intelligence test scores and with each other. It was later confirmed (Study 2) for subtests of the WAIS-R and WISC-R using the national standardization samples for those tests. STUDY 1 The data used in this study came from preliminary reports-of two experiments by Detterman, Caruso, Mayer, Legree, and Conners (1983) and Detterman (1986). The first experiment (MR/College) used an extreme groups design comparing mentally retarded persons with college students. The second experiment (High School) employed a randomly selected sample of high school students. In both experiments, subjects were given a set of computer-administered, basic cognitive tasks and a standardized IQ test. Method Subjects. The MR/College experiment compared 20 young adult mentally retarded persons (M IQ = 67.5, SD = 7.56) with 20 college students (M IQ = 115.5, SD = 7.79). The High School experiment included 141 randomly selected high school students (M IQ = 108.0, SD = 18.3). A low IQ group (M IQ = 93.0, SD = 12.3) consisted of the 68 subjects below the mean of the entire group. A high IQ group (M IQ = 122.0, SD = 9.9) consisted of 73 subjects above the mean. Tests. All subjects in both experiments were given the WAIS-R. They also took a set of computer-administered basic cognitive tasks. In the MR/College experiment, the battery consisted of 9 tests, one each for learning, relearning, probe memory, Sternberg memory search, match-to-sample, tachistoscopic identification, tachistoscopic recognition, strategy development, and choice reaction time. The High School experiment included variants of these tasks and a recognition memory task for a total of 10 tasks. All tasks were administered by computer and all responses were made on a touch screen fitted to the computer monitor. All the tasks used the same stimuli. Each task yielded several measures based on latency or errors.
IQ CORRELATIONS
351
All measures had been extensively pretested and were known to have good reliabilities (average split-half reliability = .79). For the MR/College experiment, 31 measures from the basic tasks and full scale WAIS-R IQ scores were available. For the High School experiment, 36 measures were chosen as best representing all the measures obtained from all the basic cognitive tests. WAIS-R full scale IQ scores were also included. Procedure. Subjects in both experiments were administered the battery of cognitive tasks in a darkened, quiet room. Administration took from 2 to 4 hours for each subject. At a convenient time during computer testing or after completion of the computer battery, each subject was administered the WAIS-R by a trained examiner.
Results and Discussion The results of interest were the correlations between the measures from the cognitive tasks and IQ for subjects low and high on IQ. The question being asked was if basic cognitive measures correlated more highly with IQ in low IQ groups than in high IQ groups. In the MR/College experiment, the subjects were initially divided into groups and these divisions were used for analysis. In the High School experiment two groups were formed by dividing the full group at the mean of IQ. Statistical analyses of the data were conducted in the same way for both groups. A matrix of the correlations of all cognitive variables from the tasks plus IQ was computed separately for the high and low IQ groups yielding two matrices for each experiment. For the MR/College experiment there were 32 (31 cognitive variables + IQ) variables in each matrix. For the High School experiment there were 37 (36 cognitive variables + IQ) variables in each matrix. All correlations in each matrix were corrected for restriction of range. This was essential to correct for different degrees of selection that might have occurred in each of the subgroups. Corrections were based on the Full Scale IQ standard deviation and followed the procedures suggested by Gulliksen (1950) for explicit and incidental selection. Corrections for restriction of range correct the obtained values to the value that would have been obtained had the range not been restricted. However, the correction treats the sample data as typical of the full range. Correction for restriction of range results in correlations that would have been gotten from the entire population if correlations in the subgroup were typical of the entire population. Thus, adjustments for restriction of range do not eliminate differences in correlation between high and low IQ groups. It simply adjusts both groups to a common standard deviation so that differences in correlations between the groups can be directly compared. The differences between corresponding correlations in the high and low IQ group matrices were tested using Fisher's z. Only the upper triangular portion of the matrix was used for these tests because the lower half is redundant. The value
DETTERMAN AND DANIEL
352
TABLE 1 The Average Correlation of Basic Cognitive Task Measures with IQ Scores on the WAIS-R and with Each Other (Cognitive) MR/College
High School
IQ Level
IQ
Cognitive
IQ
Cognitive
Low High
.60 .26
.44 .23
.37 .24
.26 .18
of z was evaluated by a two-tailed test. The number of correlations which were significantly larger in the low IQ group than the same correlation in the high IQ group was counted. Next, X2 was used to test the difference between the number of correlations significantly higher in the low group compared with the number (5%) expected by chance. Note that even though significance was tested as a two-tailed test, the hypothesis tested by ×2 was that the low IQ group correlations were larger than the high IQ group correlations which is a one-tailed hypothesis. This makes the test a conservative one. Table 1 shows the results of both experiments. Clearly, the correlations between cognitive task measures and WAIS-R IQ are up to twice as large in low IQ samples as in high IQ samples. The statistical analyses of the MR/College experiment comparing low and high IQ groups confirmed this observation yielding a statistically significant effect (×2(1) = 308.11, p < .001) as did the High School experiment (X2(I) = 87.79, p < .001). There were larger differences between high and low IQ groups for the correlations of IQ tests with cognitive variables than for cognitive variables with themselves although the difference was found for both. Similar results were obtained if the same statistical procedure was carried out on the correlation matrices uncorrected for restriction of range. Another way of tabulating and comparing differences in correlations was suggested by Kaiser (1968). The largest eigenvalue minus one divided by the number of variables minus one yields an estimate of average correlation in the matrix. This method was used to calculate the average correlation. The largest difference between the previous method and Kaiser's method was .06. Evidently both methods are similar. STUDY 2 If correlations between cognitive tasks and IQ scores are higher in low IQ groups, then more complex mental tests should also intercorrelate more highly in low IQ samples. To test this possibility, the standardization samples for the WAIS-R and WISC-R were divided into subgroups and analyzed in the same way as in Study 1.
IQ CORRELATIONS
353
Subgroups were formed by selecting subjects by one of their subtest scaled scores. At first, it might seem best to select subgroups on Full Scale IQ. However, that procedure would introduce spurious negative correlations among the subtests. The problem is that subtests contribute to Full Scale IQ. If subjects are selected by Full Scale IQ, their subscale scores must balance out to equal the Full Scale score. Those with higher scores on one subtest must have a lower score on one or more other subtests to keep their IQ within the range. The negative correlations induced by such selection procedures are not trivial. For example, when the WISC-R standardization sample was divided by Full Scale IQ in subgroups that are 15 IQ points wide, the average subtest intercorrelation ranged from - . 0 3 to .00 for all subgroups except the lowest (IQ < 71), for which the mean intercorrelation was .07. To avoid this problem, cases must be chosen by a score that is not a sum of some or all the subtest scores. The ideal solution would be to use a score from another test to select cases. But no other test was available, so subjects were divided into subgroups by one of their subtests. The Vocabulary subtest score was used for one analysis and, as a check, the analysis was repeated using the Information subtest for group assignment. The subtest used to choose groups as included in the analysis because, after correction for restriction of range, including it had little effect on the results.
Method
Standardization Samples. The WAIS-R and WISC-R standardization samples consisted of 1,880 and 2,200 subjects, respectively. For both tests, scores standardized within age for each subtest were used to remove age variance. All 11 subtests for the WAIS-R and all 12 subtests for the WISC-R were used for analysis. Full Scale IQ was also included for both tests. Sample Subdivision. Each standardization sample was divided into five separate ability groups by standard scores on one of the subtests. This was done using the Information and Vocabulary subtests for both the WAIS-R and WISC-R. These two subtests were selected because of their high correlation with Full Scale IQ and because they are stable across age. The standard score range (and its IQ equivalent) and number of cases for each subgroup is shown in Table 2 (p. 354). Results and Discussion Analyses were conducted in the same manner as for Study 1. Correlation matrices including all subtests and Full Scale IQ were constructed for each of the five ability level groupings. Correlations were then corrected for restriction of range based on the standard deviation of the subtest used for selection for that matrix. This procedure corrected for within-subgroup differences of range. It also allowed average subgroup correlations to reflect correlations that would
DETTERMAN AND DANIEL
354
TABLE 2 Number of Subjects in WAIS-R and WISC-R Subgroups Formed by Selection on Vocabulary (Voc) and Information (Inf)
WAIS-R Group 1
2 3 4 5
Range
1-5 6-8 9-11 12-14 15-19
IQ Equiv.
<78 78-92 93-107 108-122 >122
WISC-R
Voc
Inf
Voc
lnf
109 474 697 472 128
120 466 669 514 111
156 518 837 525 164
130 525 842 535 168
Range = scaled score range, M = 10, S D = 3. IQ Equivalent = the equivalent scaled score as an IQ, M = 100, S D = 15. have been obtained from the entire population if the ability subgroup was typical of all the population. Figure 1 shows the average correlation at each ability level. Choosing ability subgroups by Vocabulary subtest scores is nearly identical to results when groups are formed using Information subtest scores. The inclusion of Full Scale IQ in these calculations has very little effect. Omission of Full Scale IQ reduced none of the average correlations more than .04. The most obvious and striking trend apparent in Figure 1 is that low ability groups demonstrate correlations which are two times larger than high ability groups. This trend is apparent in both the WAIS-R and WISC-R standardization samples though the trend is more pronounced in the WAIS-R data. It is also apparent that there is a systematic trend for successively lower ability levels to show successively higher correlations. Each point in Figure 1 is based on 78 averaged correlations for the WISC-R and 66 averaged correlations for the WAIS-R (representing the upper triangular portion of the matrix). Statistical analyses on these data using the same procedures as in Study 1 showed that the graphical trends were highly statistically significant. Each correlation in a matrix was compared with the corresponding correlation in every other matrix for each subgroup of that test and selection method. For example, the correlations in the matrix for Group 1 of the WAIS-R selected on Vocabulary were compared with the corresponding correlations for all other subgroups of the WAIS-R selected on Vocabulary. As before, the correlations were tested using Fisher's z and the number of statistically significant differences was compared to differences expected by chance using ×2. This resulted in a total of 40 ×2 statistics, 10 for each test and subtest selection combination. Of these comparisons, only 4 ×2 values were not statistically significant. Three of these were in comparisons among subgroups of the WISC-R with selection by Vocabulary: Group 4 was not less than Group 3; Group 5 was not less than Group 4; and Group 5 was not less than Group 2. The only other ×2
IQ CORRELATIONS 0.9 i m
,~
Information
0.9
0.8
0.8
0.7
0.7
0.6
0.6
"5 ._~ 0.5
"'"
0.4
~ 0.3
0.3
0.2
0.2
0.1
0.1 78-92
Vocabulary
0.5
"'n.
~ 0.4
<78
355
93-107 108-122 IQ Equivalent
>122
0
<:78
78-92
93-107 108-122 >122 IQ Equivalent
FIG. 1. Average correlation among WAIS-R and WISC-R subtests within ability level group when groups are selected by Vocabulary or Information subtests corrected for restriction of range.
which was not statistically significant was the comparison between Group 5 and Group 4 on the WAIS-R with selection by Information. Figure 1 shows that all these differences are consistent with slight deviations of a few data points from the general trend. For the remaining 36 comparisons between groups, the ×2 values were very large, mostly larger than 100. An idea of the size of this effect can be had from the number of statistically significant differences between correlations of the lower and higher groups. The following percentage of Fisher's z comparisons were statistically significant in the right direction: WAIS-R--Vocabulary, 78%;--Information, 72%; WISC-R--Vocabulary, 40%; Information-68%. Remember that this is a conservative test because a two-tailed test was used to test a one-tailed hypothesis. Even so, the results obviously show that there is a systematic trend for the correlations among tests to be higher at lower ability levels. LISREL offers another method of testing whether correlations are different across ability level. Following the method suggested in the LISREL manual (Joreskog & Sorbom, 1986), all five correlation matrices, uncorrected for restriction of range, were simultaneously compared to determine if they were equal. This analysis was repeated for each test and for each method of selection. LISREL gives a X2 showing the degree of fit to the hypothesis or model which was that all five matrices were equal. The results showed that the matrices could not be regarded as equal for any of the tests or methods of selection. For WAIS-R selected by Vocabulary, ×2(312) = 743.95, p < .001, and selected by Information, ×2(312) = 1015.42, p < .001. For WISC-R selected by Vocabulary,
356
DETTERMAN AND DANIEL
X2(364) = 1390.27, p < .001 and selected by Information, X2(364) = 1720.92, p < .001. LISREL confirmed the results of the previous analysis showing that correlations change across ability level. Because the correlations in Figure 1 were corrected for restriction of range, comparisons can be made to the full standardization sample average correlations among subtests which, for the WISC-R and WAIS-R, were .39 and .51 respectively. It appears that the correlations in the full sample were more heavily affected by the low end of the distribution. Differences between groups appear larger at the low end of the distribution. This trend was most apparent for the WISC-R. Though correlation differences were most obvious in the low IQ groups, clearly the relationship between ability group and average subtest intercorrelation was a systematic one. If the average correlations were not corrected for restriction of range, the same relationships reported above held except, of course, the correlations were smaller. However, it is very important to correct for restriction of range in doing analyses of these kind. Differences resulting from using Vocabulary or Information subtests to form ability groups were nearly entirely eliminated by correction for restriction of range. Average correlations for lowest to highest ability groups assembled by Information subscale scores, uncorrected for attenuation, for the WISC-R were .42, .29, .26, .21, and .22 and for the WAIS-R were .56, .37, .30, .25, and .26. Even these uncorrected differences were substantial and systematic. One other effect apparent in Figure 1 is that the WlSC-R has consistently lower average correlations among subtests than the WAIS-R. This disparity seems to be largest at the lower ability levels. The difference between the two tests might be developmental in origin. However, the WISC-R and the WAIS-R are two independent tests and so the differences between them could as easily be because of dissimilarity in the tests themselves. In summary, data from the WAIS-R and WISC-R provide strong support for the contention that size of correlations among mental tests varies inversely as a function of ability level. Correlations among subtests are higher for lower ability subjects and lower for higher ability subjects. DISCUSSION The general finding from both studies is that mental tests, including mental tests measuring basic cognitive ability, have higher intercorrelations in lower ability groups than in higher ability groups. The data analyzed showed three separate findings. First, cognitive tasks correlate more highly among themselves at lower ability levels than at higher ability levels. Second, cognitive tasks correlate more highly with IQ at lower ability levels than at higher ability levels. Third, subtests of IQ tests intercorrelate more highly at lower ability levels than at higher ability levels. The findings from the two studies are consistent and systematic.
IQ CORRELATIONS
357
There can be little doubt of the reliability of this finding. However, further research will be required to determine if this same relationship can be found for other tests than those used here. Given the regularity and systematic nature of the findings for the tests analyzed here, it would be surprising if it were not a replicable finding with other data sets. Considering the consistency of these results, it seems surprising that they have not been reported before. Anastasi (1970) reviewed the literature on variables affecting psychological trait formation. She found a few studies which reported incidentally differences in average correlations across ability groups. It seems that the full importance of the finding was not understood or appreciated since none of the studies were followed up. The most similar finding to the one reported here was in a study by Maxwell (1972). As a part of a statistical exercise, he discovered accidently that two groups of children divided by their scores on a reading test showed different correlations among the subtests of the WPPSI. Correlations among subtests were higher for the group that had low reading test scores. Maxwell proposed two ad hoc explanations of the effect based on Thomson's sampling or bond theory. Unfortunately, Maxwell seems not to have attempted to determine if the differences in subtest correlations he found between high and low reading ability subjects extended beyond the sample he analyzed. Maxwell thought the reading test was the important variable in forming groups. Even so, it is very likely that division by the subtests of the WPPSI would give the same results and so validate the findings from this study. If the finding that correlations between mental tests vary systematically by level of ability is found to be a general one, not specific to certain tests, then the implications of this finding are substantial. It suggests that much of what we know about IQ would have to be reconsidered in light of this finding. Without further verification and replication of this phenomenon, extensive theoretical speculation is premature,. Nevertheless, there are some obvious implications. For example, Sternberg and Salter (1982) have discussed what they call the .30 barrier, meaning that correlations between basic cognitive tasks and IQ have often been under .30. The reason for this is probably that most researchers investigating the relationship between cognitive abilities and IQ do not include low IQ subjects in their samples. Table 1 shows what happens when only high IQ subjects are used. Average correlations between cognitive tasks and IQ remain below .30. But if the same correlations are computed for low IQ subjects, they are about twice as large. Studies of individual differences in cognition that have not included a proportional representation of low IQ subjects will be very hard to interpret. Studies done with college students alone or with other highly selected subject groups will not be generalizable to the full spectrum of intellectual ability. Much of experimental psychology assumes that college students represent the mental structures of all people. These results would suggest that such an assumption may not be warranted.
358
DETTERMAN AND DANIEL
Higher correlations among low IQ subjects may also explain why IQ tests seem to find more uses at the low end of the IQ scale than at the high end. The WlSC-R and WAIS-R, and perhaps other tests, will be more 'g'-loaded at the low end of the distribution that at the high end. That is, a general factor should account for more of the total variation among low ability subjects than it does among high. Conversely, high ability subjects will show more subtest scatter. The interpretation of many previous findings may be altered significantly if IQ tests have different correlations at different ability levels. For example, to what extent will the interpretation of heritability estimates be affected by this finding? Are heritabilities higher or lower for low IQ subjects? Can subtest scatter be a valid clinical tool for high IQ subjects even if it could be expected to be less useful at low IQ levels? Should different kinds of testing be done for low IQ subjects than is done for high IQ subjects? Can one factor analytic model adequately represent high and low ability subjects simultaneously? (It is possible that average correlations can change across ability level but that factor structure could remain unchanged.) Finally, what causes these differences in correlation at different ability levels may be the most interesting of all the questions presented by these findings. Although it is not the aim of this paper to consider fully potential explanations, one possibility is suggested by a theory of mental retardation presented by Detterman (1987). Briefly, he suggested that intelligence is a system made up of a small number of independent processes. Mental retardation is caused by deficits in central processes, meaning processes which most heavily affect all other processes in the system. If these central processes are deficient, they limit the efficiency of all other processes in the system. Because of the deficit in the central process, the entire system is brought to a uniform low level of operation. So all processes in subjects with deficits tend to operate at the same uniform level. However, subjects without deficits show much more variability across processes because they do not have deficits in important central processes. This causes high correlations among mental measures in low IQ subjects and low correlations in high IQ subjects.
REFERENCES Anastasi, A. (1970). On the formationof psychologicaltraits. American Psychologist, 25, 899-910. Detterman, D.K. (1986, November).Basic cognitive processes predict IQ. Paper presented at the PsychonomicSociety,New Orleans, LA. Detterman, D.K. (1987). Theoreticalnotionsof intelligenceand mental retardation.American Journal of Mental Deficiency, 92, 2-11.
Detterman, D.K., Caruso, D.R., Mayer, J.D., Legree, P.J., & Conners, F.A. (1983, March). Assessing cognitive deficits in the mentally retarded: Findings from overall analyses of tasks.
Paper presented at the GatlinburgConferenceon Mental Retardation,Gatlinburg, TN. Gulliksen, H. (1950). Theory of mental tests. New York: Wiley Joreskog, K.G., & Sorbom, D. (1986). LISP,EL VI. Mooresville,IN: ScientificSoftwareInc.
IQ CORRELATIONS
359
Kaiser, H.F. (1968). A measure of the average intercorrelation. Educational and Psychological Measurement, 28, 245-257. Maxwell, A.E. (1972). The WPPSI: A marked discrepancy in the correlations of the subtests for good and poor readers. The British Journal of Mathematical and Statistical Psychology, 25, 283-291. Spearman, C.E. (1904). "General intelligence" objectively determined and measured. American Journal of Psychology, 15, 201-293. Steinberg, R.J., & Salter, W. (1982). Conceptions of intelligence. In R.J. Steinberg (Ed.), Handbook of human intelligence (pp. 3-28). New York: Cambridge University Press.