INGREDIENTS IN FLUID
OF COMPLEXITY INTELLIGENCE LAZAR STANKOV
THE UNIVERSITY OF SYDNEY
JOHN D.CRAWFORD THE UNIVERSITY OF TECHNOLOGY, SYDNEY
ABSTRACT: Two studies were carried out to investigate the relationship between performances on two experimental tasks-Triplet Numbers and Swapsand measures of fluid intelligence (Gf) and short-term memory @AR). For each of these tasks four different versions of increasing complexity were constructed by varying the number of rules or mental steps required for their solution. Both tasks in the first study, and Swaps in the second study, were given twice with the instruction to work as quickly and as accurately as possible. Changes in speed/accuracy emphasis produced reliable differences in performance on the Triplet Numbers test in the first study only. As expected, increasing task complexity resulted in lower accuracy and slower responding. Regarding the relationship between the experimental tasks and intelligence factors, Gf and SAR, the most consistent and strongest association was between the accuracy measures of these tasks and Gf, although in all cases the relationship between both factors and speed scores were statistically significant. Importantly, the most consistent and strongest interaction between task complexity and Gf was obtained using accuracy measures from the experimental tasks. Other interactions with task complexity were weak. In general, as the task becomes more complex, its correlation with measures of intelligence tends to increase.
INGREDIENTS OF COMPLEXITY IN FLUID INTELLIGENCE The concept of complexity has traditionally been linked to the construct of intelligence. For example, Cattell (1950) defines ability traits and intelligence as consistent variations in behaviour accompanying changes in the complexity of stimulus patterns. Godfrey Thomson (1939, p. 51) says that intelligence is associated with the number and complexity of the patterns that the brain can (or could) make. Within the Thurstonian tradition, Zimmerman (1954) used spatial Direct all correspondence
to: Lazar Stankov, Department
of Psychology,
The University
learning and Individual Diffarances, Volume 5. Number 2, 1993, pages 73-111 All riohts of reoroduction in anv form reserved
of Sydney, Sydney, 2006. Australia. Copyright G 1993 by JAI Press, Inc. ISSN: 1041-6080
74
LEARNING
AND /ND/V/DUAL
DIFFERENCES
VOLUME 5, NUMBER
2. 1993
visualization tasks to study the effect of changes in task complexity on the factorial composition of those tasks. The notion of complexity can also be found in Guttman’s (1954) Radex model of mental abilities, and later workers using similar models have adopted the term in a parallel manner (e.g. Marshalek, Lohman, & Snow 1983). A number of other authors (see Eysenck 1979; Jensen 1977; Snow 1989; Larson & Saccuzzo 1989) have expressed views on the nature of intelligence in which the concept of task complexity plays a prominent role. Although it is often difficult to classify the writings of particular authors, there are two main ways in which the concept of complexity has been linked to that of intelligence. First is what might be termed the psychometric approach (e.g. Jensen 1977) in which the complexity of a task is operationally defined in terms of the size of its correlations with measures of intelligence, or its loading on the general factor. A similar psychometrically based definition is derived from Guttman’s Radex model and other similar ones (e.g. Marshalek, Lohman, & Snow 1983) in which complexity is identified with the radial dimension in the Radex or multidimensional scaling models. Marshalek et al. found that, for their data, these two psychometrically based definitions of complexity yield, in practice, virtually identical orderings of tasks on the complexity dimension, Given the above definition of task complexity, it should be noted that this is different from task difficulty. Factors which give rise to changes in the level of performance may not necessarily be associated with changes in correlations with intelligence. Empirical studies have shown that there is no reason to expect that more difficult tasks will always have higher correlations amongst themselves or with measures of intelligence (see Stanton & Keats 1986; Stankov 1988a; Spilsbury, Stankov, & Roberts 1990). In the language of experimental design, manipulations of task difficulty show the presence of a within-subjects main effect but may or may not show an interaction between this factor and the betweensubjects ability factor. The second way in which the term task complexity has been used in the context of mental abilities is one which focuses more on some particular characteristic, or set of characteristics, of the task. This comes closer to typical dictionary definitions of “complex” which usually emphasise the need to deal with a multitude of elements and (intricate, complicated, etc.) relationships among these elements. In this approach, although complexity is not defined in terms of correlations with outside measures, it is invariably the case that, as a part of the relevant theory or model of intelligence, more complex tasks are those which are hypothesized to be more strongly correlated with intelligence. Tasks or stimulus patterns can be described without reference to the mental processes involved in their solution. However, since cognitive tasks may be complex from a purely logical or computational point of view, yet psychologically simple (as is the case, for example, with the problem of computer vision), it is reasonable to ask what are the properties of the task that make it more complex for the human organism. The identification of these properties may be seen as a vital goal enabling us to better understand the concept of intelligence. Some of the candidate properties which have been suggested as leading to a task being more highly correlated with intelligence include a greater number of ele-
INGREDIENTS
OF COMPLEXITY
IN FLUID INTELLIGENCE
75
ments which need to be held simultaneously in mind in order to successfully complete the task (Bachelder & Denny 1977), the involvement of certain aspects of Working Memory (Crawford & Stankov 1983) or of effortful rather than automatic mental processes (Crawford 1991a) and the use of strategic or “executive control”, rather than “mechanistic” , processes (e.g. Campione, Brown, & Bryant 1985; Hunt 1980). The presentation of tests as competing tasks could also be considered as a means of increasing the complexity of the tasks in certain situations (Stankov 1983a,b, 1988a,b). Many other task properties or processes have been proposed. Thus, Salthouse (1985) lists more than forty different types of cognitive processes which have been suggested as the basis of the ability to deal with complexity. Some of these processes are broad in scope whereas others are rather narrow. A more pluralistic approach would be to accept that all of such processes are important for intelligence and, consequently, that it may be unreasonable to expect to find a single critical or “basic” process underlying intelligence. Any measure of intelligence would contain these processes in varying degrees and mixtures. Such a pluralistic view of complexity, or of intelligence, is similar to that of Thomson (1939:), and the more recent ones expressed by Humphreys (1979) and Horn (1988;). We can think of all these processes as the ingredients of intelligence. According to Snow (1989, p. 37), simple tasks ” . . . require relatively few component processes and relatively little reassembly of processing from item to item . . More complex tasks . . . not only require more components but also more flexible and adaptive assembly and reassembly of processing from item to item.” Although our position regarding complexity is in agreement with Snow’s, we may note that his conclusion was based on the analysis of a clearly distinct battery of tests of the kind used in typical factor analytic studies. Our present work, however, involves systematic de-compositions of a complex task into its ingredients, with each simpler version being a part of the more complex version. Since every cognitive process assumes the presence of a certain number of eIeme& (or ‘fundaments’ in Spearman’s terminology) and relations among these elements, our interest in this article is to study whether varying either one or both of these within a couple of carefully chosen tasks will lead to changes in performance. There is a large number of elementary cognitive processes (Carroll 1980) or narrow abilities (Horn 1968,1988), representing building blocks or atoms of cognition. Some of those listed above are simple sensory processes, but whatever additional elements and processes make perception qualitatively different from sensation also belong in the category of elementary abilities. Indeed, every cognitive task calls upon a number of these elementary abilities-the components of the analogical reasoning processes (Sternberg 1977) perhaps being the best known. INTELLIGENCE
AND THE ACCURACY AND SPEED OF PERFORMANCE
A hotly debated issue is the relative importance of speed and accuracy of performance to intelligence. As a broad generalization, researchers who focus
76
LEARNING AND INDIVIDUAL
DIFFERENCES
VOLUME 5, NUMBER
2, 1993
mainly on the study of the relationship with intelligence of reaction-time, and other elementary cognitive tasks, tend to emphasize the importance of mental speed (e.g. see chapters written by Jensen, Eysenck, Vernon, in a book edited by Vernon 1988). However, those workers who have sought an answer to this question by observing response latencies on more complex tasks, such as typically used in measures of intelligence, have reached the opposite conclusion (e.g. see chapters by Carroll, Horn, Sternberg in Vernon’s 1988 book). For such tasks, accuracy measures are more strongly correlated with traditional measures of intelligence than are speed measures. The above issue is complicated by the possibility of systematic differences between individuals, and possibly between different types of tasks, in the tradeoff of the speed and accuracy of performance. As argued by Lohman (1989) variation in speed-accuracy tradeoff can seriously affect the sizes of correlations with other variables of speed, and to a lesser extent, accuracy measures. For example, using a task of moderate complexity (a mental rotation task) correlations between response latencies and standard tests of spatial ability, varied widely from one study to another, ranging from significantly negative, through near-zero, to significantly positive.
RATIONALE AND AIMS OF STUDIES Empirical results which have suggested to some writers the usefulness of the notion of task complexity (e.g. Guttman 1954; Zimmerman 1954; Snow 1989) have typically involved the demonstration that such a complexity dimension can be seen to underly individual differences in performances on a variety of traditional mental tests. In our present work, however, we focus on only two types of tasks (the “experimental tasks”) which are then varied in a systematic manner to produce versions of progressively higher complexity. We will refer to this as a manipulation of task complexity on the basis of accounts on the nature of complexity mentioned earlier and our expectations that such manipulations should lead to higher correlations with measures of intelligence. Of particular interest is how the accuracy and speed of performance on the experimental tasks relate to traditional measures of intelligence, and whether these relationships are influenced by task complexity. We will refer to these traditional measures as the “psychometric tasks”. Although it is expected that performances should be more strongly related to intelligence as task complexity increases, it is not clear if this should be primarily manifest in the accuracy scores, the speed scores, or in both. It is possible also that the accuracy and speed of responding might be differently related to measures of intelligence for different levels of task complexity. A tentative hypothesis might be that speed measures will be more strongly related to intelligence for the low complexity tasks, but that it is the accuracy measures which are more strongly related to intelligence for tasks of higher complexity. This hypothesis follows from the simple observation that it is the accuracy of performance on highly complex items which is typically used in measures of intelligence, whereas for the more
INGREDIENTS
OF COMPLEXITY
77
IN FLUID IIVTELLIGEIVCE
elementary reaction-time tasks, it is the relationship between the speed of performance and intelligence which is usually the focus of investigation. The psychometric tests chosen for the two studies are ones measuring abilities which have been linked with the notion of task complexity. Two broad factors of the theory of fluid and crystallized intelligence (see Horn 1988) are of particular significance in this respect: fluid intelligence (Gf) and short-term acquisition and retrieval function (SAR). Measures of fluid intelligence, such as the well-known Raven’s Progressive Matrices test, are commonly regarded as representing the “essence” of general intelligence, and of tasks of high complexity (e.g. Eysenck 1979; Marshalek et al. 1983). Although early formulations of the theory of fluid and crystallized intelligence did not distinguish between Gf and SAR, it is now generally accepted that SAR (sometimes referred to as short-term memory ability) is only moderately related to intelligence. Marker tests of SAR were nevertheless included in our studies because they have particular significance in certain theories of complexity (e.g. Bachelder & Denny 1977). In light of the warning of Lohman (1989) on the problem of speed-accuracy tradeoff influencing correlations with speed and accuracy scores, it was decided to investigate this to a limited extent. This was done by giving the experimental tasks twice, once with instructions which emphasize the importance of accuracy, and once with instructions emphasizing the importance of speed, in subjects’ performances. Carroll (1976) has pointed out that instructions regarding the requirements of the tasks may be of crucial importance (see also, Hunt 1978). These instructions may be particularly important in the selection of what have become known as the “strategies” in problem solving. Our interest in this study is on a general class of strategies which depend on the perceived goal of testing: namely, depending on what is stressed in the instructions, either accuracy or speed may be sacrificed or traded off in order to maximize the gain. It is reasonable to assume that under conditions of different emphasis, individual differences due to an effort to optimize the overall performance would emerge, perhaps showing more efficient performance in the case of people who score high on intelligence tests. Although, as concluded by Lohman (1989), such changes in instructions are generally not an effective way of systematically varying speed-accuracy tradeoff, any observed changes in correlations with other variables produced by this manipulation would be of significance. The generality of findings on the relationship between the speed and accuracy of performance and intelligence may be limited if such results are found to be sensitive to such instructional manipulations.
EXPERIMENT
1
The main focus in this study is on the Triplet Numbers test. This test was first employed by Wittenborn (1943) in psychometric studies of attention. In the
78
LEARNING
AND INDIVIDUAL
DIFFERENCES
VOLUME
5, NUMBER
2, 1993
original administration of this test subjects heard a series of three-digit numbers (or “triplets”) presented at a constant rate. The task is to state whether or not each of the triplets satisfies a particular set of rules which have been carefully explained prior to the commencement of the test. Subjects are required to keep the appropriate set of rules in mind, to perform comparisons among the elements of the triplet, and to compare the outcomes of these element-comparisons. Since triplets of numbers come one after another over a period of several minutes, subjects have to invest a considerable amount of mental effort and concentrate in order to perform the task successfully. The present study used a computer-presented version of the test in which the three stimuli are displayed on a computer screen simultaneously and responses are typed on a keyboard. This modification was introduced in order to reduce the memory component and thus make the uncontaminated comparison process the most salient feature of the task. Different versions of this test, of varying complexity, were generated by using the same number of stimuli and systematically changing the complexity of response rules. The effects of a similar complexity manipulation on the Swaps test is studied in detail in the second study. In order to investigate the possible influence of individual differences in speed-accuracy tradeoff, both the Number Triplets and Swaps test were given twice, once with the instructions to work as accurately as possible and once with instructions to work as quickly as possible.
METHOD SUBJECTS Participants in this study were 128 First Year students (73 females) at the University of Sydney, who took part in the experiment in order to fulfill course requirernents.i They are therefore a sample from a population of our students who number about 1,300 people each year. The mean age of our sample is 19 years and 3 months. The youngest member of the sample was 18 years and 5 months old at the time of testing. The eldest member was 23 years old. During the past decade, student population at the University of Sydney had between 15% and 18% of children from migrant families. All those included in our sample had their High School education completed in Australia which means that their command of English was satisfactory at the very least. Post-war immigrants to Australia come largely from the European countries with Italy, Greece, Yugoslavia, Turkey and Germany providing slightly more members than the rest of Europe. Although a large number of Vietnamese immigrants gain entry to the University, they seem to gravitate towards natural sciences and computing, not psychology. From our previous work using WAIS-R with the same population of students, we know that the average Total Scale IQ is
lNGf?EDlENTSOFCOMPLEXININ
FLUID INTELLIGENCE
79
about 115. We have repeatedly found systematic difference between Performance and Verbal IQs. Thus, Verbal WAIS-R IQ tends to be around 120 and Performance IQ is around 110. The lowest WAIS-R Total IQ score which we have obtained in our work with this population has been 99.
EXPERIMENTAL TASKS The Triplet Numbers Test. Four versions of this test were used in this study. For all versions, the stimulus for each item was a set of three randomly selected digits (a “triplet”) displayed simultaneously on the computer screen. Subjects were required to indicate by pressing either a “yes” or “no” key whether the set of three numbers satisfied a particular rule which had been explained to them earlier in the instructions for that version of the test. As soon as either key is pressed the stimulus for the next item-another randomly chosen three-digit numberappears on the screen. The four versions of the test were constructed by changing the response rule. The four rules used, in order of increasing complexity, are listed below. These instructions make the ‘Two-rules’ Triplets test similar to the original Number Triplets test designed by Wittenborn (1943) and later shown by Stankov (1988) and Crawford (1988, 1991b) to be a measure of fluid intelligence: 1. Z%e ‘Search’ Triplets. Press the ‘Yes’ key if a particular number-e.g., 3-is present within the triplet. Otherwise, the ‘No’ key is to be pressed. Time: 2 min. 2. The ‘Half-rule’ Triplets. The task here is to press ‘Yes’ key if the second digit is the largest within the triplet. Otherwise, the ‘No’ key is to be pressed. Time: 3 min. 3. ?i%e ‘One-rule’ Triplets. In this version, only one rule of the type used in the ‘Two-rules’ Triplets is employed-i.e. press the ‘Yes’ key if the second digit is the largest and the third digit is the smallest. Otherwise, the ‘No’ key is to be pressed. Time: 6 min. 4. The ‘Two-rules’ Triplets. The subject has to press the ‘Yes’ key if the first digit is the largest and the second digit is the smallest or the third digit is the largest and the first digit is the smallest. Otherwise, the ‘No’ key is to be pressed. Time: 6 min. Each of the four versions of the test were presented in the same manner. Firstly, instructions were given in which the appropriate rule was explained and practice items given. Then the test stimuli were displayed in the subject-paced manner described above, until the time limit for that particular version had expired. The presentation of the actual test items did not commence until the subjects indicated that they fully understood the instructions, including the response rule. As a part of the instructions subjects were told of the time limit for that version of the task. In the preliminary work for this study, we administered all four versions with the same time limit, 6 minutes. Our finding was that reliable data can be ob-
80
LEARNING
AND /NOW/DUAL
DIFFERENCES
VOLUME 5. NUMBER
2. 1993
tained with the shortened versions of the easier tasks and, in order to save experimental time, we employ different time limits for the four complexity levels.2 This means that the number of items attempted by a subject will vary across versions and across subjects for a given version. This is why we employ the percentage of correctly answered items as the accuracy score for all versions of the Triplet Numbers test. In addition to the accuracy scores, we also calculate each subject’s average speed at providing the correct answers. These two scores-accuracy and speed-are the two dependent variables in the present study.
The Swaps Test. This is a 48-items test developed
by Crawford (1988) in which three letters (J, K, and L) are given on the screen. For each item, the subject is given one or more instructions to mentally swap the positions of pairs of letters. The subject’s task is to type in the final order of letters after completing all the required swaps. The structure of the items is described in more detail in the Method section for Experiment 2. In the present experiment a total numbercorrect score was derived from this test. Unlike in the Triplet Numbers tests, this score represents a subject’s accuracy of performance since in this test all subjects complete the same number of items.3 This experimental task is used to study the emphasis manipulation only. Since Crawford (1988) has shown that this test measures fluid intelligence, we also use it to supplement correlational findings with the Gf composite scores.
PSYCHOMETRIC TESTS In order to investigate the relationship between measures of broad intellective abilities and the complexity manipulations of the Triplet Numbers task, we employ four psychometric tests. Although these tests have been used in our previous research in intelligence they are not standardized measures. Our choice of tests and items was guided by a desire to have a. The accepted marker tests of Gf and SAR; b. Tests that would provide adequate difficulty and variability for our sample of subjects; c. Have a sufficient number of items to ensure satisfactory reliability and be economical of total experimental time. Fluid Ability (Gf). a. Raven’s Progressive Matrices test. In this study a collection of 40 items from the Standard (10 items from the two most difficult sections) and Advanced (30 items) versions of the Raven’s Progressive Matrices test was used as an estimate of the Cognition of Figural Relations primary ability of the Gf/Gc theory (see Horn 1976). This test was given with a 20 mins time limit. b. Letter Series test. This is a 38-items-open-ended, not multiple choice-version of the typical Thurstonian series completion test. This test is a well-known marker of the Induction primary factor and fluid intelligence at the second order. The test was also administered with a 20 mins time limit (see Horn 1976). For both Raven’s Progressive Matrices test and for the Letter Series test, separate scores were obtained for the number correct, number incorrect, and number
INGREDIENTS OFCOMPLEXIN
IN FLUID INTELLIGENCE
81
abandoned items. Also, we obtained the average time-or speed scores-for correct, incorrect, and abandoned items. Preliminary analyses with all these measures showed that, since the number of abandoned items was rather small, the main information regarding performances on the psychometric tests was contained in the accuracy and speed scores of the number correct measures. We focus on the number-correct scores in this article. On the basis of preliminary analyses of our data and previous experiences with these tests, we obtained a composite fluid intelligence (Gf) score as a simple sum of the number correct scores for these two psychometric tests of our battery.
Short-term
Acquisition and Retrieval Function (SAR). This ability (also known in the literature as the Memory Span factor) was assessed with two tests: d. Number Span Forward; and e. Number Span Backward. The instructions for both these tests were similar to typical Digit Span subscales such as those used in WAIS. However, rather than using the analog of the method of limits in the administration of the tests (as is the case with WAIS), we used the analog of the staircase method. Basically, this involved increasing by one, or decreasing, by one the number of digits to be recalled if the response was correct or incorrect, respectively. This algorithm was continued until the time limit of 15 minutes for the test had expired. The score used was the average of the ‘peak lengths of correctly recalled lists, that is, the average length of correctly recalled lists just prior to an error occurring. We use the sum of the Number Span Forward and Number Span Backward scores as the SAR composite in this study.
PROCEDURE The Triplet Numbers test and the Swaps test were presented twice-once with the instruction to work as carefully as possible (this is referred to as the ‘Accuracy Emphasis’), and once with the instruction to work as quickly as possible (‘Speed Emphasis’). During the first session of testing, 71 subjects were given ‘Accuracy Emphasis’ instructions and the remaining 57 subjects were given ‘Speed Emphasis’. On average, the second session was held about one week after the first session. The psychometric tests were given at times convenient to the subjects, either as parts of one of the experimental tasks sessions or at separate times. The two sessions of testing lasted typically about 3 hours. The Triplet Numbers, Swaps, Letter Series and both Digit Span tests were all presented by a computer. The Raven’s Progressive Matrices test was presented in the usual way (i.e., as picture on paper), but the answers were typed into the computer. For these experiments, we employed a network of four Commodore 64s and subjects were tested in groups of 2 to 4 at the same time. An experimenter was present in the room throughout the testing in order to provide further instructions if needed, and in order to run the computer system.
a2
LEARNING
AND INDIVIDUAL
DIFFERENCES
VOLUME 5, NUMBER
2, 1993
STATISTICAL ANALYSES Two main methods of data analysis are employed in this paper: ANOVA and correlational analysis. In order to examine the effect of complexity manipulation we employ two-way ANOVA with one between-subjects factor (either Gf or SAR) and one within-subjects factor (four levels) using accuracy and speed of test-taking as dependent variables. In ANOVA, it is assumed that complexity is an independent variable with levels corresponding to the four ‘rules’. The presence of a fanning-out effect in a plot of two-way interaction would indicate that, as complexity increases, the difference between high-scoring Gf (or SAR) and low-scoring Gf (or SAR) subjects also increases. A similar analysis was used to examine the effect of the accuracy/speed emphasis manipulation, in which this within-subjects factor (two-levels) replaced the complexity factor. Although latent dimensions which can be derived from a set of correlations can be used as a basis for formulating hypotheses regarding the relationship between changes in complexity and intelligence, we limit ourselves in the main body of this article to the examination of raw correlations only. These correlations are between Triplet Numbers scores at the particular levels of complexity and Gf (and SAR). Each correlation coefficient corresponds to a difference between subjects scoring high and low on Gf (and SAR) in their performance on a given version of the Triplet Numbers task. A large difference would imply a high correlation. It should be kept in mind that individual correlation coefficients stand on their own and there is no statistical assumption regarding the complexity variable as found in ANOVA. The presence of increasing correlations, however, would suggest the underlying complexity dimension. It is to be expected, therefore, that there will be some discrepancies between ANOVA and trends in raw correlations. The expected ceiling level on performance, with the consequent relatively small variance under the easier lower complexity conditions, is one reason why it is useful to employ both techniques in this article. The definitional formula for the Pearson’s product-moment correlation coefficient is based on standardized variables. This means that variables entering into that formula are transformed to have the same mean and variance. The problem of unequal variances is effectively removed through this transformation provided, of course, that low variances, due to floor or ceiling effects, do not lead to significant differences in test reliabilities for different levels of complexity. * Factor-analytic definitions of complexity depend on such correlational data. Severe discrepancies in the sizes of variances under different treatment conditions would render the use of univariate ANOVA inappropriate since F-test is based on the assumption of homogeneity of variances. In a sense, ‘pooled’ Sum of Squares in ANOVA serves a similar equalization function as does the standardization in correlations. Even though it is generally acknowledged that the F-test is robust when the homogeneity assumption is violated, the presence of severe violation poses an interpretational problem. The use of multivariate analysis of
INGREDIENTS
OF COMPLEXITY
IN FLU/D INTELLIGENCE
a3
variance (MANOVA-or perhaps a better name would be profile analysis, see Morison 1967) does not depend on the assumption about the homogeneity of variance although, of course, results are affected by the relative sizes of variances.5 This is one of the main reasons for employing MANOVA in the repeated measures designs. However, when applied to the same set of data with heterogeneous variances, MANOVA will generally produce a more conservative test statistic than ANOVA. Most of the relevant computer packages nowadays provide outputs containing both ANOVA and MANOVA procedures for the repeated measures design. Since a psychological audience may be more familiar with ANOVA, we report univariate F-tests for the within subjects factors whenever the size of multivariate statistic is not substantially lower than ANOVA’s F-test. If discrepancy between univariate and multivariate test statistics is pronounced, we report Pillai’s F-statistic. Two additional aspects of our uses of the analysis of variance in this article need to be brought forward. First, we follow the counsel of those who advocate the employment of a full range of values of a continuous variable rather than a dichotomization procedure (see, for example, Cohen & Cohen 1983). Thus, the between-subjects independent variables of interest to us here-the psychometric ability scores of Gf and SAR-are kept in their original form and no attempt is made to generate groups of “High” and “Low” Gf or SAR subjects. This type of analysis involving the uncategorized variables can be accomplished easily with common statistical packages. The complexity (four levels) and emphasis (two levels) independent variables are the within-subjects repeated measures variables. Second, in order to display our results in a way that can be easily understood by those familiar with the traditional ANOVAS-i.e. those accustomed to categorical independent variables-we obtain standard deviations for the fitted or ESTIMATE values. These are the scores obtained by the model part (i.e., without error or RESIDUAL part) of MANOVA. Values corresponding to one standard deviation above the mean and one standard deviation below the mean for the ESTIMATE scores are plotted in Figures 1 to 5 of this article. These plots correspond to the traditional ANOVA displays illustrating the interaction effects when dichotomization is employed. Thus, the plot of values for the “mean plus one standard deviation of the ESTIMATE scores” correspond to the “High”, say, Gf group. Similarly, the plot of values for the “mean minus one standard deviation of the ESTIMATE scores” corresponds to the “Low” Gf group.6
RESULTS Table 1 presents descriptive statistics for the psychometric measures and Gf and SAR composites. The data were analysed using a series of two-factors ANOVA procedures. In
LEARNING AND INDIVIDUALDIFFERENCES
Descriptive
VOLUME
5, NUMBER
2, 1993
TABLE 1 Statistics for the Psychometric Tests and Gf and SAR Composites. Ravens Progressive Matrices
Letter Series Test
Forward Digit Span
Backward Digit Span
Fluid (Gf) Intelligence Composite
Short-term Acquisition and Retrieval (SAR)
Exp. I. (N = 128) II
5.400
5
23
I I .800
Maximum
38
33
11.000
12
68
22.665
Mean
24.414
23.438
8.420
8.387
47.852
16.807
5.980
5.468
I.056
I.285
10.238
2.071
5.200
28
12.083
6-I
20.500
Minimum
SD
9
Exp. 2. (N = 70) Minimum
12
9
Maximum
35
33
Mean
24.814
23.77
SD
5.591
6.250 1 I.000
I
5.122
II
8.441
8.223
48.586
16.663
I.037
I.296
9.134
2.044
each case there was one between-subjects factor (either Gf or SAR composite) and one within-subjects factor (either Emphasis or Complexity). The analyses to examine the effects of Emphasis (work as accurately as possible vs work as quickly as possible) were carried out for both experimental tasks. The analysis to examine complexity effects (four levels corresponding to the number-of-rules manipulation) was carried out with the Triplet Numbers task only.
THE ROLE OF INSTRUCTIONAL EMPHASIS Table 2 presents means and standard deviations for the accuracy scores and speed of test taking under two different emphasis conditions for both Triplet Numbers and Swaps tests. This Table also summarizes the results of various ANOVA analyses. It is apparent that both experimental tasks of this study are rather easy. On the average, subjects can solve correctly about 96% of the items of the Triplet Numbers test, and they can provide their answers with an average speed of slightly more than 1 second per item. On the average, they can solve about 42 out of 48 items (88%) of the Swaps test and they spend about 12 seconds per item. Inspection of the F-tests for the main effects and interactions in Table 2 shows that a relatively small number of comparisons involving different treatment conditions produced significant differences. These are restricted to the Triplet Numbers test. Thus, instructing subjects to work as accurately as possible leads to a larger number of correctly solved items and to a slower speed of working through that test. On the other hand, instructing them to work quickly leads to a faster speed of providing answers and to a larger number of errors. This is evidenced in the significance of the main effect of Emphasis for both Percentage
INGREDIENTS
OF COMPLEX/N
IN FLUID
INTELLIGENCE
TABLE 2 Means and Standard Deviations for the Percentage Correct and Speed of Test Taking Scores on the Triplet Numbers Test and Number-Correct Scores for the Swaps Test under Different Emphasis Conditions in Experiment 1. Average Time to Percentage Correct Mean
SD
Accuracy Emphasis
96.6/2
Speed Emphasis
95.613
Triplet Numbers Test
ANOVA
Answer an Item Mean
SD
2.195
I.453
.454
2.695
1.203
.3/o
Summaries:
Emphasis Main Effect F( I, 126) =
11.969; p = .OOl
8.822; p = ,004
5.757; p = .018
0.732; p = ,394
2.356; p = .I27
3.017; p = ,085
Emphasis by Gf Interaction F(
1,126)=
SAR Interaction F(I ,126) =
Number Correct SD
Mean
Swaps Test Accuracy Emphasis
43.032
4.024
Speed Emphasis
41.616
4.004
ANOVA
Summaries:
Emphasis Main Effect F( I. 126) =
1.377; p = .243
Emphasis by Gf Interaction F( 1,126) SAR Interaction F(
=
1,126)=
0.028; p = .868 0.271; p = ,603
Correct (F(1,126 = 11.969; p < .Ol) and in the Average Time to Answer an Item (F (1,126;) = 8.822; p < .05). The only significant interaction effect in Table 2 also involves the Percentage Correct scores of the Triplet Numbers test. Thus, the interaction between fluid intelligence and Emphasis is significant at the .05 level (F(1,126) = 5.757; p < .05). The difference between high Gf scoring individuals and low Gf scoring individuals is greater under the speed Emphasis condition than under the accuracy Emphasis condition. For the Swaps test, neither the main effect for Emphasis nor the interactions between Emphasis and psychometric factors reach significance. Overall, the experimental manipulation of Emphasis is more effective with the Triplet Numbers test than it is with the Swaps test. However, only one interaction with a psychometric composite is significant at the .05 level, but not at the .Ol level. It is possible that the effects would be more pronounced if the difficulty of the experimental tasks were to be increased.
86
LEARNING AND lNOlV/DUAL DIFFERENCES
VOLUME
5, NUMBER
2. 1993
MANIPULATIONS OF TASK COMPLEXI~ The Effect of Task Complexity: ANOVA Results. Table 3 presents means and standard deviations for the percentage correct and speed of test taking scores over different levels of the complexity manipulation. Even though the tasks are obviously easy, we can see a clear trend towards poorer performance under the more complex conditions. As can be seen in Tables 4 and 5, this trend is significant for both percentage correct (Pillai’s F-statistic (3,124) = 21.372; p < .05) and for the average time per item measure (Pillai’s F-statistic (3,124) = 35.887; p < .05). In both cases the linear trend accounts for most of the explained variance. The relationship between Gf and SAR factors and performance on the Triplet Numbers test is reflected in the main effects of these two between-subjects factors, and also in their interactions with complexity. First, let us consider the main effects for the percentage correct scores. Our analysis shows that the main effect for Gf is significant (F(1,126) = 17.177; P < .05 in Table 4) and for SAR it is not significant (F(1,126) = 1.999; P < .05).7 This is in accordance with our previous findings with the Triplet Numbers test, which indicate that this test is a measure of the fluid intelligence factor. ANOVA’s main effect tells us about that co~elation-people with high Gf scores, but not high SAR scores, perform better on the Triplet Numbers test. Using our second dependent variableaverage speed in answering an item-we find that both Gf (F(1,126) = 16.179; P
TABLE 3 Means and Standard Deviation for the Percentage of Number Correct and Speed of Test Taking Scores on the Triplet Numbers Test under Different Complexity Conditions. E.rperiment I (N = 128)
‘Search’ Triplets ‘Half-rule’ Triplets ‘One-rule’ Triplets ‘Two-rule’ Triplets
97.831 97.318 96.006 93.294
(I.0071 (1.765) (2.626) (5.303)
.686 1.003 I.305 2.316
Averqe
(, 146) (.281) (.307) 1.708)
Time to
Answer an Item (Seconds)
‘Search’ Triplets ‘Half-rule’ Triplets ‘One-rule’ Triplets ‘Two-rule’ Triplets
97.945 97.635 96.062 91.343
(3.201) (I ,856) (3.939) (9.595)
I .033 1.822) I.132 1.393) 1.432 (.412) 3.157 (2.054)
INGREDIENTS
OF COMPLEX/N
IN FLUID INTELLIGENCE
07
TABLE 4 Analysis of Variance of the Percentage Correct Scores of the Triplet Numbers Test. Two-factors Design With Fluid Intelligence As a between-subjects Factor and Levels of Complexity As a Repeated Measures Factor. *Betu~eenSubject Effect: Fluid Intelligence
(Gf)
SOURCE
SS
Gf ERROR
291.395
I
291.395
2181.504
126
17.314
DF
MS
F 17.177
P 0.000
*Within Subjects Effects. ss
DF
MS
569.313 2238.391
3 378
189.771 5.922
SOURCE
Complexity ERROR Pillai Trace = 0.341.
F-statistic
Single Degree-of-Freedom
= 21.372
Polynomial
SOURCE
ss
Linear ERROR Quadratic ERROR Cubic ERROR
235.470 1264.960 18.063 652.083 0.193 321.348
253.726 2238.391
Univ. F-Test
32.047
P
0.000
DF = 3, 124 P = 0.000
Contrasts: DF
MS
1
Univ. F-Test
P
23.455
0.000
3.490
0.064
126 1 126
235.470 10.039 18.063 5.175 0.193 2.550
0.076
0.784
3 378
84.575 5.922
14.282
0.000
126
1
INTERACTION: Gf X Complexity ERROR Pillai Trace = 0.192.
F-statistic
= 9.833
DF = 3, 124 P = 0.000
< .05) and SAR (F(1,126) = 13.619; P < .05) correlate with it. In both cases, people who have higher scores on Gf and SAR tend to perform faster than people with low scores. Of major importance to us in this paper are the interaction effects between intelligence factors (Gf and SAR) and complexity. Interaction effects for the accuracy scores are displayed in Figure 1. Our finding is that interaction between Gf and complexity is significant (Pillai’s F-statistic (3,124) = 9.833; p < .05 in Table 4). The interaction between SAR and complexity is not significant (Pillai’s F -statistic (3,124) = 2.685; p > .05). It is apparent from Figure la that the difference between high and low Gf scoring individuals increases as the complexity of the Triplet Numbers task increases. Obviously, the same trend albeit not significant, is also evident with the SAR composite (see Figure lb). Figure 2 displays the interaction effects using the average speed of answering an item as a dependent variable. Our analyses show that interaction between complexity and both Gf (Pillai’s F-statistic (3,124) = 7.493; p < .05 in Table 5) and SAR (Pillai’s F-statistic (3,124) = 3.624; p < .05) is significant. We may note, however, that the SAR interaction is not significant at the .Ol level. The in-
LEARNING
AND lNDlVlDlJAL
DIFFERENCES
WILUME
5. NUMBER
2, 1993
TABLE 5 Analysis of Variance of the Speed Scores of the Triplet Numbers Test. Two-factors Design With Fluid Intelligence As a between-subjects Factor and Levels of Complexity As a Repeated Measures Factor. *Between Subject Eflect: Fluid Intelligence SOURCE Gf ERROR
(Gf)
SS 5.81 I 45.253
DF 1 126
MS 5.81 I 0.359
ss 23.305 33.168
DF 3 378
MS 7.768 0.088
F 16.179
P 0.000
*Within Subjects Effects: SOURCE Complexity ERROR Pillai Trace = 0.465.
F-statistic
Single Degree-of-Freedom
= 35.887
Polwomial
Univ. F-Test 15.966
P 0.000
Contrasts:
ss 2.825 22.298 0.736 6.085 0.645 4.784
DF
126 1 126
MS 2.825 0.177 0.736 0.048 0.645 0.038
INTERACTION: Gf X Complexity ERROR
4.206 33.168
3 378
I.402 0.088
F-statistic
P 0.000
DF = 3, 124 P = 0.000
SOURCE Linear ERROR Quadratic ERROR Cubic ERROR
Pillai Trace = 0.153.
Univ. F-Test 88.532
= 7.493
I 126
I
15.237
0.000
16.981
0.000
15.978
0.000
DF = 3, 124 P = 0.000
terpretation of the interaction effects with respect to the speed measure is not as clear-cut as it was with the accuracy measure. Inspection of the graphs (Figures 2a and 2b) shows that significance of the interaction effect arises largely from the fact that the difference between low and high scoring individuals on psychometric composites remains about the same at the first three complexity levels and increases at the most complex level. With the accuracy scores of Figure 1, a systematic increasing trend is clearly visible. Without additional data it is hard to say whether the trend in Figure 2 will continue with further increases in task complexity.
The Effect of Task Complexity: Correlations. Table 6 displays correlations between Gf, SAR, and Triplet Numbers scores at different levels of complexity. The top panel of Table 6 presents the data from Experiment 1, and the bottom panel of the same Table provides the data for Experiment 2. These correlations, by and large, display the same trends as those described in the ANOVA section. Thus we can see that, when accuracy scores are used as a dependent variable, the size of correlations between Triplet Numbers test and SAR is small over all complexity
IMGREDIENTS
OF COMPLEX/N
IN FLUID
INTELLIGENCE
89
FIGURE 1 Mean accuracy (percentage correct) scores for those having high and low scores on fluid intelligence (Gf, Fig. la) and short-term acquisition and retrieval function (SAR, Fig. lb) over four levels of complexity of the Triplet Numbers test in Experiment 1. Figure la
0
1
2
Complexity
Figure
3
4
Level
lb
97 -
96 3 8t EP
95-
94 -
93 -
92
1. 0
1
I 2
Complexity
I 3
I. 4
Level
levels. This corresponds to the non-significant SAR main effect. All other between-subjects main effects are significant and therefore the sets of correlations within given columns are obviously different from zero. We can also see that trends in correlation coefficients over the complexity levels by and large follow the trends present in Figures 1 and 2. Thus, when the difference between high
LEARNING AND INDIVIDUAL
90
DIFFERENCES
WLUME
5. NUMBER 2, 1993
FIGURE 2 Mean speed of test-taking (average time per item in seconds) scores for those having high and low scores on fluid intelligence (Gf, Fig. 2a) and short-term acquisition and retrieval function (SAR, Fig. 2b) over four levels of complexity of the Triplet Numbers test in Experiment 1. Figure 2a
0-l 0
.
,
,
1
2
.
, 3
.
, 4
Complexity Level
Figure 2b
2-
l-
01
0
1
2
3
4
Complexity Level
and low scoring subjects for a particular level of complexity is relatively high, the correlations tend to be high, and vice versa. It should also be noted that, from inspection of the correlations for the first study, the significant interaction in ANOVA involving speed as a dependent measure is not due to a systematic increase in complexity.
INGREDIENTS OF COMPLEXITY IN FLUID INTELLIGENCE
Correlations
Experiment
TABLE 6 of Gf and SAR, with Triplet Numbers Scores over Different of Complexity Manipulation.
Levels
I Percentage
Complexity Level
‘Search’ Triplets ‘Half-rule’ Triplets ‘One-rule’ Triplets ‘Two-rule’ Triplets Experiment
91
Gf
SAR
.031 ,169
- ,077 ,006 .I64 .I40
,326 .354
Correct
Swaps
,048
,195 ,340 ,444
Speed
Gf +
w f SWOpS
w
SAR
swaps
swaps
,041 ,201 .376 .43 I
-.I99 -.296 -.I77 -.369
-.225 -.335 -.236 - .278
-.035 -.089 -.055 -.095
-.I78 -.276 -.I65 -.388
2 Percentage Correct
C0mple.rit.y Level
‘Search’ Triplets ‘Half-rule’ Triplets ‘One-rule’ Triplets ‘Two-rule’ Triplets
Speed
Gf
SAR
Gf
.23l .263 ,301 ,463
,219 .090 .087 ,246
-.2lO -.274 -.360 -.404
SAR
-.212 - ,025 -.I10 -.265
The top section of Table 6 contains correlations between levels of complexity of the Triplet Numbers test and accuracy scores of the Swaps test. As mentioned earlier, this test is a measure of fluid intelligence and its correlations with complexity levels should be in agreement with the Gf correlations. This is indeed the case for the accuracy scores. Furthermore, the increasing trend in correlations is even more pronounced if the Gf score is added to the Swaps score.
DISCUSSION The outcome of Experiment 1 can be summarized succinctly with respect to both manipulations. Thus, the complexity manipulation produces a noteworthy interaction with Gf, but not with the SAR measures, when we employ accuracy scores with the Triplet Numbers test. Correlations between these scores and measures of fluid intelligence tend to increase with complexity. The presence of interactions with the speed measures are not due to a systematic increase in complexity within the levels investigated here. It is possible, however, that these interactions will be in accordance with the expectations with more complex levels than those employed in this study. Correlations between psychometric composites and speed measures of the Triplet Numbers test do not follow a systematic trend. Our manipulation of instructional emphasis produced significant main effects
LEARNING
AND INDIVIDUAL
DIFFERENCES
VOLUME 5, NUMBER
2, 1993
with both accuracy and speed scores of the Triplet Numbers test. However, only the interaction between accuracy scores and Gf was significant at the .05 level. The accuracy of high and low Gf subjects differed more under speed emphasis instructions than under accuracy emphasis instruction. The Swaps test showed no sensitivity to Emphasis manipulation. Closer inspection of the conditions of the Experiment also showed that the results involving the Emphasis manipulation may have been affected by one aspect of our procedure. Data gathering started during the first half of the school year and finished close to the end of the school year. We observed that there was a tendency for those subjects who participated earlier to be more conscientious and academically motivated than those who participated later in the year. Those who came first were given the accuracy emphasis condition first and speed emphasis condition second. It is therefore possible that the Emphasis factor in Experiment 1 might have been confounded with this self-selection process, although it is not clear how the results which were obtained could easily be explained in this way. The possibility of such a confounding, however, is one reason why we decided to run another experiment. Another reason was to see if the results with respect to complexity can be reproduced with the Triplet Numbers task and also with the Swaps test.
EXPERIMENT 2 The main interest in this experiment is in the Swaps test. This test was designed by Crawford (1988) as a measure of Working Memory performance, but was found to be a measure of fluid intelligence. This test could also be likened to the markers for the Temporal Tracking primary factor (see Stankov & Horn 1980) which define fluid intelligence at the second order (Horn & Stankov 1982). The design of the experiment is similar to that of Experiment l-there are two emphasis conditions (speed vs accuracy) and four complexity conditions. In our administration of the test, subjects are required to read the Swap instruction, permute a pair of letters, and keep the permuted order in mind while giving the answer or while reading the instruction and carrying-out another permutation. Since the number of elements remains the same (three letters) at all four levels, complexity manipulations can be seen as the increase in the number of relations that have to be dealt with during the solution process. The aim is to study the across-tasks’ generality of previous findings. We also employ the Triplet Numbers test in this experiment. With this test, however, we focus on the complexity manipulation only in an attempt to replicate the more salient findings of Experiment 1. This test was presented only once with the commonly used instructions to work as quickly and accurately as possible.
INGREDIENTS OF COMPLEX/N
IN FLUID INTELLIGENCE
93
METHOD SUBJECTS Subjects in this study were 70 First Year students (42 females) at the University of Sydney. They were part of the student intake at this University one year following those who took part in Experiment 1. We have no indications that this sample would be different in any important way from the sample of Experiment 1. Performance on psychometric tests of Gf and SAR, did not show significant differences between the subject samples of our two experiments (see Table 1).
EXPERIMENTAL TASKS There are two experimental tasks in this study: a. T1zeTriplet Numbers test. This test was described in the Method section of Experiment 1. In the present experiment, however, the test was given only once and therefore no attempt is made to investigate the role of instructional emphasis. b. The Swaps test. The stimulus material for all versions of the Swaps test consists of a set of three letters-J, K, and L-presented simultaneously on the computer screen (though not necessarily in that order) together with a number of instructions to interchange, or “swap”, the positions of pairs of letters. The four versions of the task differ in number of such instructions. There were four lots of twelve items with an equal number of swaps. These items were randomly intermixed to form a 48-items test. The subjects did not know how many swaps would be required on any given item prior to the appearance of the stimulus for that item. This aspect of administration is different from our approach with the Triplet Numbers test, where items were blocked according to difficulty. Again, as with the Triplet Numbers test in these experiments, we kept all three letters on the screen until the answer was typed in. The required swap instructions were also kept visible throughout the subjects’ work. The answer consisted in typing the three letters in the order resulting from all the swaps. An example of the four levels of complexity is as follows:
Stimuli: J K L 1.
‘One Swap.’ ‘Swap 2 and 3.’ Ans.: J L K.
2.
‘Two Swaps.’
3.
‘Three Swaps.’
4.
‘Four Swaps.’ ‘Swap 2 and 3,’ ‘Swap 1 and 3,’ ‘Swap 1 and 2,’ ‘Swap 1 and 3.’ Ans.: J K L.
‘Swap 2 and 3,’ ‘Swap 1 and 3.’ Ans.: K L J. ‘Swap 2 and 3,’ ‘Swap 1 and 3,’ ‘Swap 1 and 2.’ Ans: L K J.
LEARNING
AND lNDlVlDUAL
DIFFERENCES
WLUME
5, NUMBER
2. 1993
The numbers following the word ‘Swap’ refer to the position of letters within the set. Each Swap instruction was presented on a separate line in the middle of the computer screen. We collected two performance measures from this testthe number of correct answers (out of twelve items) and the average speed of answering an individual item. These two scores-accuracy and speed-are the two dependent variables in the present study.*
PSYCHOMETRIC TESTS In this experiment Experiment 1.
we use the same measures
of Gf and SAR as those used in
PROCEDURE The Swaps test was presented twice-once with the instruction to work as carefully as possible (this is referred to as the ‘Accuracy Emphasis’) and once with the instruction to work as quickly as possible (‘Speed Emphasis’). In order to avoid the problem of having keener subjects at the beginning of testing and poor subjects at the end, ‘Accuracy Emphasis’ or ‘Speed Emphasis’ instructions during the first session of testing were given interchangeably to subjects according to their order of signing for the experiment. This ensured that 35 subjects performed under one order of instructions and the other half under another order. The whole testing lasted typically about two and a half hours and it was carried-out in two sessions. All other conditions were identical to those of Experiment 1. Statistical analyses, too, were the same as those of the previous experiment.
RESULTS TRIPLET NUMBERS TEST As expected, the complexity manipulation in this experiment produced results which show that as the task becomes more involved, the performance declines. This is apparent in Figures 3a and 3b, which can be compared to Figures la and lb respectively. Again, significant trends obtain for both percentage correct scores (Pillai’s F = 5.632; p < .05) and for the average time per item measure (Pillai’s F(3,76) = 9.384; p < .05). As before, linear trends are significant for both dependent variables. Turning now to the main effects of Gf and SAR with both accuracy and speed as dependent measures, we find that all four F-tests are significant. Thus, for the accuracy scores, Gf main effect gives an F(1,68) = 9.753, p < .05; and SAR main effect gives F(1,68) = 4.511, p < .05. For the speed scores, Gf main effect is F
INGREOIENTS
OF COMPLEXITY
IN FLUID INTELLIGENCE
95
FIGURE 3 Mean accuracy (percentage correct) scores for those having high and fow scores on fluid intelligence iGf, Fig. 3a) over four levels of complexity of the Triplet Numbers test in Experiment 2. Mean speed of test-taking (average time per item in seconds) scores for those having high and low scores on fluid intelligence (Gf, Fig. 3b) over four levels of complexity of the Triplet Numbers test in Experiment 2. Figure
3a
96 -
96 -
94 92 90 88 -
86-L 0
.
1 1
.
, 2
Complexity
Figure
.
I 3
*
I 4
.
l.cvel
3b
4-
3-
Z-
1:
;,:;,
(1,68) := 17.430, p < .05; and SAR main effect is F(1,68) = 5.679, p < .05. These results are similar to those obtained in the last study, except that previously the positive association between the accuracy scores and SAR did not achieve statistical significance. Finally, for the interactions between the intelligence factors and complexity, we find that for the accuracy scores the outcome is the same as that of Experi-
LEARNING AND INDIVUXIAL
DIFFERENCES
VOLUME
5. NUMBER
2, 1993
ment 1-Gf shows a significant interaction (Pillai’s F(3,66) = 3.266 p < .05) and SAR does not (Pillai’s F(3,66) = 1.496 p > .05). Also, as previously, the speed measure shows a significant interaction with Gf (Pillai’s F(3,66) = 3.855, p < .05). However, unlike the first study, the interaction with SAR is not statistically significant (Pillai’s F(3,66) = 2.744; p > .05). The nature of the significant Gf by complexity interaction effects can be seen in Figure 3. (Fig. 3b corresponds to Figure 2a from Experiment 1). Correlations of the Triplets test with Gf and SAR under different levels of complexity are presented in the lower panel of Table 6. The pattern of correlations is similar to that obtained in the first study, which can be seen in the top panel of Table 6. In the second study there does appear to be a more definite trend for correlations between the speed scores and Gf to increase with increasing task complexity. It should be pointed out however, that this difference between the two studies lies in the correlation between Gf and the ‘One-rule’ Triplets speed score, which was somewhat higher in the second study (r = - .36) than in the first (Y = - .18).
SWAPS TEST: EMPHASIZING SPEED VS ACCURACY As in the previous study, we administered the Swaps test twice; once under accuracy emphasis instructions and once under speed emphasis instructions. The results of this manipulation for the Swaps test are the same as before: there were no statistically significant main effects, or interactions with Gf or SAR, for either accuracy or speed performance measures.
THE EFFECT OF TASK COMPLEXITY Table 7 presents means and standard deviations for the two dependent variables in this study-number correct scores and the average speed in providing answers to the Swaps test items. It is apparent from this Table that the increase in complexity leads to poorer performance and to a slower speed of responding. Indeed, the main effects of complexity for both accuracy (Pillai’s F(3,66) = 22.353, p -C .05) and speed (F(3,66) = 3.408, p < .05) scores are significant. The combined effects of the complexity manipulation and the Gf and SAR
TABLE 7 Means and Standard Deviations for the Number Correct and Average Speed of Test Taking Scores on the Swaps Test under Different Complexity Conditions.
complexity One Swap Two Swaps Three Swaps Four Swaps
Number Correct Il.793
(.665)
I 1.492 c.899) 10.285 (I ,569) 9.429 (2.262)
3.892 5.708 10.714 14.949
(1.311) (1.641) (3.527) (6.000)
INGREDIENTS OF COMPLEXITY IN FLUID INTELLIGENCE
97
factors can be seen in the plots of Figures 4 and 5. In this Experiment, using the number correct score as a dependent measure, both Gf (Pillai’s F(3,66) = 11.490, p < .05) and SAR (Pillai’s F(3,66) = 4.448, p < .05) have significant interactions with the complexity manipulations. From Figure 4 it can be seen that as complexity increases, the difference between high and low scorers on Gf and SAR in the accuracy of performance on the Swaps test also increases. Using the speed measure as the dependent variable, however, there is no significant interaction
FIGURE 4 Mean accuracy (percentage correct) scores for those having high and low scores on fluid intelligence (Gf, Fig. 4al and short-term acquisition and retrieval function (SAR, Fig. 4bl over four levels of complexity of the Swaps test in Experiment 2. Figure
ef
0
.
,
.
,
1
4a
.
2
,
1
3
Complexity
Figure
.
4
Levels
4b
SARAccur +IS -
..I
I
0
1
.
I
I
I
I
2
3
4
5
Complexity
Level
SAAAccur -1Sr
98
LEARNING AND INDIVIDUALDIFFERENCES
WLUME 5. NUMBER 2, 1993
FIGURE 5 Mean speed of test-taking (average time per item in seconds) scores for those having high and low scores on fluid intelligence (Gf, Fig. 5a) and short-term acquisition and retrieval function (SAR, Fig. 5b) over four levels of complexity of the Swaps test in Experiment2. Figure
5a
20 -,
01
0
.
1
.
1
(
.
2 Complexity
Figure
21
. 0
I 1
I 2 Complexity
,
,
3
4
Level
5b
I 3
I 4
.
Level
between complexity and either Gf (Pillai’s F(3,66) = .472; p > .05) or SAR (Pillai’s F(3,66) = .227; p > .05). Thus, even though the students, on average, tended to work more slowly through the more complex tasks, this tendency was equally strong for those with higher and lower ability scores. Correlations between the Gf and SAR scores and performance on the Swaps test under each of the complexity conditions are presented in Table 8. It is
INGREDIENTS OF COMPLEX/N IN FLU/O INTELLIGENCE
Correlations
99
TABLE 0 between Gf, SAR and Swaps Test at Four Levels of Complexity. Awroge Time to Answer an ttrm
ACCUiTlCV
Complexity
Level
‘One Swap’ ‘Two Swaps’ ‘Three Swaps’ ‘Four Swaps’
Gf
SAR
Gf
SAR
.ooo .203 .464 ,463
- ,020 ,065 ,255
- .033 -.005 -.018 -.015
-.046 -.I16 -.064l - .022
,345
apparent that an increase in complexity leads to an increase in the size of the correlations between the Swaps test and Gf, and SAR, for the accuracy but not the speed scores. Again, the trend in correlations is in general agreement with the results from ANOVA interactions.
PRESENT DATA IN RELATIONSHIP TO SOME METHODOLOGICAL ISSUES THAT MAY ARISE IN STUDIES OF COMPLEXITY Whenever we try to use experimental manipulations in order to study the ingredients of complexity, methodological issues both of general nature and specific to the experiment in question may emerge. In this section we consider briefly the problem of making inferences in the presence of ceiling or floor effects, the implication of the present data for Guttman’s notion of a simple order of complexity (simplex), and the relative sensitivity of speed and accuracy measures to complexity manipulations.
The issue of Ceiling Levels of Performance. Are the lowest levels of complexity lacking in dispersion because our measurement procedures are too crude to detect individual differences; or should we see the lack of variability as a genuine feature of the least-complex tasks? In other words, what is the status of tasks that are low in complexity? First, let us observe that the easiest versions in both tests of this study are so easy that virtually errorless performance takes place. It is quite plausible to say that, under these circumstances, the variable is not a true “individual differences” variable. This requires a commitment as to the reason for the lack of variability. One option is to deny its genuineness and say that an improvement in procedure is needed. For example, it is possible to argue that prolonged work on this task would have generated greater variance. This argument is not appealing to us since prolonged testing could lead to the appearance of variance due to fatigue. Another option is to say that a lack of variability at the least complex level is a genuine finding and that an increase in complexity is accompanied by a variable showing greater individual differences and increased common variance.
100
LEARNING
AND INDIVIDUAL
DIFFERENCES
VOLUME 5. NUMBER 2. 19Y3
In this section we wish to explore whether the existence of several levels of complexity may be helpful in bringing us closer to the understanding of the source of low correlation. To illustrate, consider scatterplots of two variables with the appropriate difficulty levels. Zero correlation is represented by a familiar circular arrangement of individual points and positive correlation is depicted by an ellipse that rises towards the right. We can think of the change from positive to zero correlations as the emergence of points within the lower righthand quadrant and upper left-hand quadrant of the scatterplot-the ellipse becomes fatter. In the presence of ceiling effects, however, changes from zero to positive correlation are characterized by the disappearance of individuals’ points from only one of the quadrants-the upper left-hand quadrant. Figure 6 shows scatterplots of the relationship between Gf and the Triplet Numbers test under the four levels of complexity in Experiment 1 (the corresponding correlation coefficients are presented in the first column of Tables 6). It can be seen in this Figure that the increases in correlations that follow changes in complexity derive from the gradual increase in the number of people with low Gf scores who tend to make an increasing number of errors. The situation is similar to that reported by Hunt (1980) involving the primary-secondary task paradigm-i.e. people with low Gf scores tend to make errors earlier-i.e. even at the simple tasks. We feel that, when viewed from this perspective, the argument that low correlation is an artefact of the ‘restriction in range at the ceiling level’ rather than a genuine psychological phenomenon, is not very appealing. In order to gain further insight into the performance at the ceiling levels of performance one can vary complexity independently from difficulty. We could, for example, add the memory component to both our tasks by, say, removing the instructions about the requirements of the tasks from the screen during the solution time. In that case, the tasks would be more difficult at all levels of complexity and it would be important to find out if the same increases in correlations take place. Guttman’s Simplex and Complexity. As discussed earlier, several psychologists have proposed a psychometric, or operational, definition of complexity based on the relative size of a tasks’ correlation with measures of intelligence, or its loading on the general factor. There is, however, another psychometrically based concept of complexity which has affected psychometric thinking on the topic for the past thirty five years. Guttman (1954, 1955) coined the term “simplex” to refer to the “simple order of complexity.” A set of variables shows a simplex pattern if their intercorrelations are high close to the main diagonal of a correlational matrix and become gradually lower as one moves towards the lower left-hand and upper right-hand corner. Guttman showed that one latent dimension is sufficient to describe such a pattern, whereas factor analysis requires more than one factor. Simplex matrices have aroused a lot of interest during the past decades because of their challenging mathematical properties and because they embody a very simple and appealing substantive idea about the nature of change from one level to the next.
INGREDIENTS OFCOMPLEXIN
INFLUD INTELLIGENCE
101
FIGURE 6 Scatterplots of the relationship between Gf and different levels of the Triplet Numbers test from Experiment 1. (Table 6 presents correIation coefficient that summarise these relationships.) Note: These graphs do not indicate the existence of subjects with identical pairs of scores.
: 70
-
i? 2
6’3)
: 60 20
1 40
30 Gf:
Fluid
I 50
I 60
1 70
20
30 Gf:
In&lligence
40 Fluid
50
60
70
Intelligence
100 i .. IIO-
.
.
:
..
go-x.+&qy
..
.* .. *
* .
.
1. HO
-
i?
80
-
::
70
-
3'i
60 20
I 40
30
I 50
I 60
I 70
7o
*
60 20
Gfr
Fluid
Intelligence
30 Gfr
40 Pluid
50
I 60
f 70
Intelligence
Originally, correlational matrices displaying a simplex pattern are assumed to arise from a process whereby an accretion of the following kind takes place:
c2:al
+ a2
C$U*
+
etc.
a2
+
u3
102
LEARNING
AND INDIVIDUAL
DIFFERENCES
WLUME
5, NUMBER
2. 1993
In this representation, c’s refer to the levels of complexity and u’s are the (elementary) processes available at each level. Under these conditions, two adjacent levels would show a higher correlation than further removed levels because they have more processes in common. If we were to correlate tasks at different levels of complexity with an intelligence test, correlations would gradually increase in accordance with our definition of complexity. This way of thinking may provide an adequate account of the theoretical process involving variations in complexity. For example, we may say that ui belongs to a sensory process requiring a minimum number of elements and relations, that a2 belongs to a perceptual process requiring more elements and relations, etc. However, this type of accretion model is implausible for statistical reasons. Corballis (1965) has shown that simplex-like matrix is consistent with three statistically equally plausible but conceptually contradictory models implying either accretion or decretion or both these processes simultaneously. We wish to add that this model is also implausible in view of the fact that psychological tasks are seldom constructed in a way that would make them pure measures of any of the processes. The ‘continuum of psychological complexity’ is a useful abstraction. In tasks showing a gradual increase in correlation with intelligence it is possible-as in the two tests of the present study-that a higher-level process may call upon a particular lower-level process several times during the execution and upon another process fewer times than previously. Hence: c,Aa, +a1 + 54 c&z, + a* + 3a,. In this case, ci may have higher correlation with ca than it would have with c2. A correlational matrix implying simplex structures within c’s would not thereby be provided; yet if the highest c level displayed correlation with intelligence, all lower-level c’s would show a decreasing trend. Different weights may also arise from the relative salience of the ingredients rather than from the repeated calls upon the ingredient process. Moreover, the additive function may not be appropriate for at least some tasks or for parts of the tasks; the multiplicative function may be better suited. It follows that the increase in correlation with an outside measure does not necessarily imply a simplex correlational matrix within a set of variables. Is it feasible to view complexity in terms of correlations with outside measures of intelligence and pay less attention to structure within the levels of complexity? We believe that the answer to this question is affirmative. In this article, complexity is defined with respect to the correlations between a set of ‘internal variables (i.e. the set containing increasing levels of complexity) and an ‘external’ set (i.e. measures of intelligence). External variables may be represented by a single score or by a number of measures. In the analyses of this article we employed single scores which are composites based on several measures. If
lNGRED/ENTS OF COMPLEXITY IN FLUID INTELLIGENCE
103
there were a multitude of external measures, one might choose to perform a factor analysis and employ the method of extension analysis (see Dwyer 1937) or calculate factor scores. These factor scores could then be treated in the same way as we treated simple composites in this article. Factor analysis, preferably, would be hierarchical in nature so that broad factors of the theory of fluid and crystallized intelligence could be identified. All three techniques would produce correlations with external variables, which could be interpreted in a similar fashion. Correlations within the internal set of variables may or may not behave in a simplex-like fashion. Given the possibility of greater salience of lower-complexity processes in some higher-complexity tasks and a chance of having a reduced between-subjects variability for some (say, easier) tasks, the appearance of a simplex pattern is likely to be an exception rather than a rule within the ‘internal set of variables. It should be noted that there is one situation where the Guttman’s definition of simplex and factorial definition would converge. This is where the set of variables load on only two ability factors, intelligence and some other group factor. Given conditions such as comparable reliabilities, increasing loadings on intelligence would be accompanied by decreasing loadings on the group factor. Thus a simplex pattern of correlations would be expected as a result of closer variables having a more similar mix of the two abilities. McDonald (1980) provides an example that illustrates this situation. The correlations within each set of experimental tasks, from both studies, are shown in Table 9. Of the three correlational matrices only the one in Table 9a approaches a simplex structure for the first four variables. The other two matrices are clearly not simplex-like. Comparing the correlations among the Triplet Numbers tasks from the first and second studies (Table 9a and 9b), we can see a marked difference in the correlations involving the easiest (‘Search’ Triplets) task. Those with the other Triplets tasks are considerably higher in the first study than in the second. No explanation can be offered for this difference. The remaining correlations are similar for the two studies. The results of factor analysis are also useful. Using the maximum likelihood Chi-square statistic as a criterion of goodness-of-fit, clear support for the existence of one factor is found for the Swaps test only (Table SC). We may note that in all three solutions the Gf variable is at least partly responsible for the poor fit-this variable is clearly ‘external’ in all three sets of data. If these matrices were to be included into larger batteries of tests sampled representatively to cover the whole domain of cognition, different levels of complexity would define different factors-the least complex processes would perhaps load on the sensorial or perceptual factors and the most complex processes would load on fluid intelligence.
The Rofe of Speed and Accuracy in the ~easufe~ent
of Co#~/exity and /nte~/igence. TWO issues regarding speed have been addressed in this article. Firstly, there is a question regarding the role of emphasis. We wanted to find out if instructions to
104
LEARNING AND INDIVIDUAL
Within-Complexity
Correlations
DIFFERENCES
TABLE 9 (Including Correlations One-Factor Solutions.
VOLUME
5, NUMBER
with the Gf Composite)
2. 1993
and
a. Triplet Numbers, Experiment I Factor Variables
I. ‘Search’ Triplets 2. 3. 4. 5.
I
2
3
4
I
‘Half-rule’ Triplets ‘One-rule’ Triplets ‘Two-rule’ Triplets Gf composite
.693 ,462 .425 .031
I ,638 .534 .I69
I ,493 .326
I ,354
5
Loadings
.70 .89 .72 .63 I .25 Chi-square = 23.772 df=5,p= .00@2
b. Triplet Numbers. Experiment 2 Factor Variables
I. 2. 3. 4. 5.
‘Search’ Triplets ‘Half-rule’ Triplets ‘One-rule’ Triplets ‘Two-rule’ Triplets Gf composite
2
I
4
3
I ,106 ,322 ,298 ,231
I ,610 .425
I ,498 ,301
,263
I ,463
5
Loadings
.36 .69 .81 .66 I .45 Chi-square = 20.595 df = 5, p = .oOl
c. Swaps Test, Experiment 2 Factor Variables I. One Swap 2. Two Swaps
3. Three Swaps 4. Four Swaps 5. Gf composite
2
I
4
3
I -.002 ,028 ,060 ,000
5
Loadings .04
I ,301 ,321
,203
I ,567 ,464
I ,463
.40 .75 .76 I .61 Chi-square = 1. I 197 df = 5, p = .95
work as quickly as possible would produce a different outcome with cognitive tasks than the instruction to work as accurately as possible. Such a difference could have been reflected in either of the two dependent variables-i.e. measures of speed of test-taking or measures of accuracy. The results are different for the two tasks employed in this study. The overall performance on the Triplet Numbers test is affected by the instructions while performance on the Swaps test is not affected. On the other hand, the emphasis variable shows very weak interaction with the abilities of Gf and no significant interaction with SAR. The only interaction-significant at .05 but not at the .Ol level-is again with the
INGREDIENTS
OF COMPLEXITY
IN FLUID INTELLIGENCE
105
Triplet Numbers test and Gf in Experiment 1. The paucity of significant interactions indicates that emphasis itself does not act as a “complexity” manipulation in this study. Overall, however, the emphasis manipulation proved to be a rather weak manipulation in both experiments of this article. Secondly, there is a question about the utility of a measure of speed of testtaking. Our results show that this measure may be a useful adjunct to the typical number-correct accuracy scores. This is in part because correlations between accuracy and speed scores are not high indicating that these two dependent measures tap different things (see Table 10). Also, speed measures show sensitivity to complexity manipulation (speed decreases as tasks become more complex), and to a lesser extent to the emphasis manipulation. The speed measure also shows some correlation with the between-subjects variables of Gf and SAR but only for the Triplet Numbers test. The same measure from the Swaps test shows no correlation with Gf and SAR. However, even with the Triplet Numbers test, we may note some important inconsistencies. For example, in Experiment 1, SAR has about as high correlations with the speed measure as does Gf but, in the replication of Experiment 2, Gf shows higher correlations than SAR. Also, Gf shows a gradual increase in correlation as complexity increases in the replication of the Triplet Numbers test, but a similar trend is not apparent in the first experiment. As mentioned in the introductory part of this article, the main reason for including both speed and accuracy scores in this study derived from the hope that it might be possible, along the lines of the work of Spilsbury et al. (1990), to calculate a composite score representing some kind of “efficiency” in achieving a trade off between speed and accuracy. Given these inconsistent results regarding speed we do not proceed with this calculation here. We may conclude that, for designing tests of intelligence, having a speed measure in addition to the accuracy score may be useful with some cognitive tasks. It would be interesting to explore conditions under which the usefulness of this measure tends to emerge. Our experience has been that correlations between accuracy and speed scores vary depending on the difficulty of the task. This is borne out in Table lo-speed and accuracy scores show higher correla-
Correlations
TABLE 10 between the Accuracy and Speed Scores of the Triplet Numbers Tests in Experiment 1. Percentage
Speed
Scores
Search Half-rule One-rule Two-rules
Search
,230 .300 ,299 ,290
Half-rule
,168 ,095 ,191 .I60
Correct Scores One-rule
,081 ,165 ,157 ,093
-.I10 -.041 ,024 -.066
106
LEARNING AND INDIVIDUALDFFERENCES
tions if the task is easy (e.g. ‘Search difficult (e.g. ‘Two-rules’ Triplets).
VOLUME 5, NUMBER
Triplets) and low correlation
2. 1993
if the task is
DISCUSSION Our findings involving this complexity manipulation show that as the task becomes more complex, performance-as measured by both accuracy and speed scores-deteriorates, and therefore the task becomes more difficult. In itself, this outcome is interesting, but if one is seeking an improved understanding of the construct of intelligence, it is not particularly illuminating. However, a consistent finding for both the Triplet Numbers test and the Swaps test-an increase in accuracy score correlation with fluid intelligence (Gf) paralleling the increase in complexity-is of the utmost importance. SAR shows the same trend in only one out of the three data sets. Finally, Gf’s correlation with the speed scores obtained in the replication study of the Triplet Numbers test also shows the increase. Complexity manipulations with both tasks have worked as plannedwe have developed what appear to be reliable procedures for increasing the general factor loadings of psychological tests. In the language of cognitive psychology from the introductory sections of this article, the main ingredients of complexity appear to be the number of elements and relations that have to be dealt with while a person tries to solve a problem. At a general level, this is a fair description of both experimental tasks used in this study. As the title of this article suggests, our approach has some similarities with the componential approach to the study of intelligence (see Sternberg 1977) in that we take two measures of intelligence and break them into smaller components. Our approach, however, requires that ingredients of the complex task show a systematic decrease in processing requirements similar to the conceptions of complexity proposed by Guttman (1954, 1955). The instructions for these tasks can be comprehended with relative ease, they do not depend on esoteric knowledge and they are not novel in the sense of being non-entrenched. They are, however, effortful and controlled in the sense that they require assembly and reassembly from item to item. Given that some tasks are complex from the computational point of view but easy from the point of view of human information processing, an approach starting from the well-established measure of Gf and breaking it down into its ingredients appears promising. It provides an empirical test of the concept of complexity. In our tasks, it is not required to find new relations or new elements as Spearman has stipulated in his noegenetic laws. The increase in the number of relations to be considered in the solution process is the only relevant aspect. This means that, in principle, we can easily increase complexity beyond the levels included in our study. Three methods can be used to accomplish that goal; (1) Increase the number of swaps in the Swaps test or relevant rules in the Triplet
INGREDIENTS
Of COMPLEXITY
IN FLUID INTELLIGENCE
107
Numbers test; (2) Modify tasks so that additional controlled processes become more critical; (3) Use another test together with our present tasks in the competing task paradigm. All three should lead to increases in correlation. The sensitivity of accuracy measures in the present study indicates that the probability of committing an error may be a critical aspect of performance. People with higher fluid intelligence scores seem to commit a smaller number of errors as the number of relevant relations increases. Since both tasks consist of an increasing series of steps-permutations in the Swaps test, comparisons and checks to establish if the rule has been satisfied in the Triplet Numbers testpropensity to make errors may be a critical feature. This may be due to a failure to keep track of the sequence of steps because of a memory loss or to an interference and a breakdown of attention. Speed, by comparison, does not behave in this manner. In this study we have focused on processes of Gf and SAR. They are the most relevant abilities relating directly to the concept of complexity. It is important to keep in mind, however, that the domain of cognitive abilities includes several other broad factors (TSR, Gc, Gs, C) which do not have such a direct link with the notion of complexity as do these two. In any future work with the present tasks, it would be particularly important to include perceptual factors of Gv (broad visualization) or Ga (broad auditory function) in the study, since these would allow for a link with the lower levels of complexity manipulation. Our results contain information of methodological nature. Thus, it appears that the cognitive operations involved in these tasks do not generate Guttman’s simplex. Since it is likely that attempts to construct tasks with several levels of complexity will lead to either floor or ceiling effects, it is inevitable that problems of interpretation will arise. If the least complex task shows a ceiling effect is that tasks low correlation with measures of intelligence due to a restriction in range or to its genuine simplicity? We do not have a definite answer to that question. However, if a priori psychological considerations were applied in choosing that task as a point on a complexity dimension and statistical results do not contradict such an interpretation, the approach that attributes low correlation to a restriction in range appears to us as overly conservative. Finally, our results speak to the issue of relative importance of accuracy and speed measures of cognitive performance. As expected, the general finding was that increasing task complexity was accompanied by increasing correlations with intelligence. However, this trend was more consistently found for the accuracy scores on these tasks than for the speed measures. Without exception, correlations between accuracy scores and intelligence monotonically increased with increasing task complexity, although in many cases the increase in the size of the correlation from one level of complexity to the next was small and not statistically significant. Although in both studies the correlations between the Triplet Numbers speed score and intelligence were of comparable size to those obtained with accuracy scores, the trend for higher correlations with intelligence with increasing task complexity was found for speed scores in the second study, but
108
LEARNING AND INDIVIDUALDFFERENCES
VOLUME 5, NUMBER
2,1993
not in the first. Speed measures for the Swaps test, despite showing systematic lower performance with increasing task complexity, showed no such trend in their correlations with intelligence. Correlations between these scores and intelligence were found to be near zero for all four levels of task complexity. This contrasts with the significant correlations between the speed scores and intelligence for the Number Triplets task in both the first and second studies. No explanation can be offered at this stage for this difference in the performances on the Numbers Triplets and Swaps tasks. As stated in the Introduction, a plausible hypothesis might be that speed scores are more highly related to intelligence for low complexity tasks, and accuracy scores for more complex tasks. Despite the lack of a consistent pattern, for speed scores, in the correlations with intelligence, it is clear that our results are not compatible with the hypothesis of such a reciprocal relation between accuracy and speed scores in their correlations with intelligence. For the Triplet Numbers speed scores, correlations with intelligence seem to increase, or at least be maintained, with increasing task complexity. It is possible though, that with a further increase in task complexity, which would bring the difficulty level of the test to be comparable to that of typica measures of intelligence, a decrease in correlations between speed and intelligence would be observed. This would be consistent with the observation by Horn (1988) and others of a low association between the speed of performance on typical intelligence test items and traditional measures of intelligence. Lohman (1989) pointed to the possibility that systematic differences in speedaccuracy tradeoff could significantly effect correlations with speed and accuracy scores. To see if such an effect could be demonstrated in our studies, tasks were given twice, once with the instructions which emphasized the importance of accuracy, and once with instructions which emphasized the important of speed. This emphasis manipulation produced significant changes, in the expected direction, only for the Triplet Numbers task in the first study. A significant interaction was also observed between this instructional variation and subjects’ fluid intelligence scores. An emphasis on speed in the instructions led to a stronger association between fluid intelligence and the accuracy of performance, than did an emphasis on accuracy. In other words, the accuracy of lower ability subjects was particularly low when they were encouraged to work more quickly. Although our data does not suggest a possible explanation for this observed interaction, it does demonstrate the importance of taking into account the possible influence of speed-accuracy trade-off when drawing general conclusions on the relationship between the speed and accuracy of performance and intelligence.
ACKNOWLEDGMENTS: We are grateful to M. Dobson for his help in data collection, to N. Karadimas for his help with programming test administration, and to D. Grayson for statistical advice. Helpful comments were provided by C. MacLeod, K. Kennelly, R. Snow and members of the Individual Differences Seminar at the University of Sydney.
INGREDIENTS
OF COMPLEXITY
IN FLUID INTELLIGENCE
109
NOTES 1. All First Year students at the University of Sydney are required to spend a number of hours participating as subjects in psychological experiments. We were allocated three hours of subjects’ time for this study. 2. The accuracy test-retest reliability estimates for the four levels of the Triplet Numbers test are as follows: .83 for the The ‘Two-rules’ Triplets, .85 for the The ‘One-rule’ Triplets, .86 for the ‘Half-Rule’ Triplets, and .87 for the ‘Search’ Triplets. These values were obtained with a sample of 47 University students subsequent to the collection of data reported in the present article. 3. The original plan was to study complexity effects with the Swaps test in this Experiment. However, due to an unfortunate omission in programming data storage, only total scores over all four conditions described in the Method section of Experiment 2 became available at the completion of the data gathering stages of this study. One of the reasons for running two experiments with the same tasks derives from our desire to rectify this omission. 4. It is true, of course, that the restriction in range affects correlations. However, this is reflected in different ways in ANOVAs and in correlations. 5. This simply means that even though the variances may be different, the presence of these differences does not affect the validity of the MANOVA model. 6. R. Snow suggested yet another way to examine our data-comparison of regression slopes of our experimental tasks on psychometric factors. Differences in slopes between the four levels of complexity would convey similar information to that provided by MANOVAs and correlations. We do not employ this procedure here because, even though the test of the differences between slopes is routine in independent groups designs, it is not routine in the repeated measures case. See, however, linear fits to the data in Figure 6. 7. In order to save space, we do not present analogous ANOVA tables for the SAR factor. 8. Accuracy scores of the Swaps test produced the following test-retest correlations with a sample of 43 University of Sydney students: .68 for the One-Swap, .75 for the TwoSwaps, .66 for the Three-Swaps, and .69 for the Four-Swaps conditions.
REFERENCES Bachelder, B.L. & M.R. Denny. (1977). “A theory of intelligence: 1. Span and the complexity of stimulus control.” Intelligence, 1, 127-150. Campione, J.C., A.L. Brown, & N.R. Bryant. (1985). “Individual differences in learning and memory.” In Human abilities: An information processing approach, edited by R.J. Sternberg. New York: W.H. Freeman. Carroll, J.B. (1976). “Psychometric tests as cognitive tasks: A new ‘Structure of Intellect’.” In The nature of intelligence, edited by L. Resnick. Hillsdale, NJ: Lawrence Erlbaum. -. (1980). Individual dijferences in psychometric and experimental cognitive tasks. Chapel Hill, NC: The L.L. Thurstone Psychometric Laboratory, University of North Carolina. (Report No. 163; Document AD-A086 057, National Technical Information Service).
110
LEARNING AND INDIVIDUALDIFFERENCES
VOLUME 5, NUMBER
2. 1993
Cattell, R.B. (1950). Personality, a systematic and factual study. New York: McGraw-Hill. Cohen, J. & P. Cohen. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Corballis, M.C. (1965). “Practice and the simplex.” Psychological Review, 72, 399-406. Crawford, J.D. (1988). Intelligence, task complexity and tests of sustained attention. The University of New South Wales: Unpublished Ph.D. Thesis. -. (1991a). “Intelligence, task complexity and the distinction between automatic and effortful mental processing.” In Intelligence: Reconceptuulizution and meusurement, edited by H. Rowe. Hillsdale, NJ: Lawrence Erlbaum. -. (1991b). “The relationship between tests of sustained attention and fluid intelligence.” Personality and Individual Differences, 12, (6), 599-611. Crawford, J. & L. Stankov. (1983). “Fluid and crystallized intelligence and primacylrecency measures of short-term memory.” Intelligence, 7, 227-252. Dwyer, P.S. (1937). “The determination of the factor loadings of a given test from the known factor loadings of other tests.” Psychometriku, 1, 212-218. Eysenck, H.J. (1979). The structure and measurement of intelligence. New York: SpringerVerlag. Guttman, L. (1954). “A new approach to factor analysis: The radex.” Pp. 258-348 in Mathematical thinking in the social sciences, edited by P.F. Lazarsfeld. Glencoe, IL: Free Press. -. (1955). “A generalized simplex for factor analysis and a facetted definition of intelligence.” Psychometriku, 20, 173-192. Horn, J.L. (1968). “Organization of abilities and the development of intelligence.” Psychological Review, 75, 242-259. . (1976). “Human abilities: A review of research and theory in the early 1970’s.” Annual Review of Psychology, 27, 437-485. . (1988). “Thinking about human abilities.” Pp. 645-685 in Handbook of multivuriute psychology, edited by J.R. Nesselroade. New York: Academic Press. Horn, J.L. & L. Stankov. (1982). “Auditory and visual factors of intelligence.” Intelligence, 6, 165-185. Humphreys, L. (1979). “The construct of general intelligence.” Intelligence, 3, 105-120. Hunt, E. (1978). “Mechanics of verbal ability.” Psychological Review, 85, 109-130. -. (1980). “Intelligence as an information-processing concept.” British Journal of Psychology, 71, 449-474. Jensen, A.R. (1977). The nature of intelligence and its relation to learning. Paper delivered at the T.A. Fink Memorial Lecture, University of Melbourne, Sept. 17, 1977. Larson, G.E. & D.P. Saccuzzo. (1989). “Cognitive correlates of intelligence: Toward a process theory of g.” Intelligence, 13, (l), 5-32. Lohman, D.F. (1989). “Individual differences in errors and latencies on cognitive tasks.” Learning and Individual Differences, 1, (2), 203-226. Marshalek, B., D.F. Lohman, & R.E. Snow. (1983). “The complexity continuum in the radex and hierarchical models of intelligence.” Intelligence, 7, 107-128. McDonald, R. (1980). “A simple comprehensive model for the analysis of covariance structures: Some remarks on applications.” British Journal of Mathematical and Statistical Psychology, 33, 161-183. Morrison, D.F. (1967). Multivariate statistical methods. New York: McGraw-Hill. Salthouse, T.A. (1985). A theory of cognitive aging. New York: North Holland. Snow, R.E. (1989). “Aptitude-treatment interaction as a framework for research on individual differences in learning.” In Learning and individual differences: Advances in theory
INGREDIENTS
OF COMPLEXITY
IN FLUID INTELLIGENCE
111
and research, edited by P.L. Ackerman, R.J. Sternberg, & R. Glaser. New York: W.H. Freeman. Spilsbury, G., L. Stankov, & R. Roberts. (1990). “The effects of test’s difficulty on its correlation with intelligence.” Personality and Individual Differences, II, (lo), 10691077. Stankov, L. (1983a). “The role of competition in human abilities revealed through auditory tests.“ Multivuriute Behavioral Research Monographs, No. 83-1, pp. 63 & VII. -. (1983b). “‘Attention and intelligence.” Journal of Educutionul Psychology, 75, (4), 471490. -. (1988a). “Single tests, competing tasks, and their relationship to the broad factors of intelligence.” Personality and Individual Differences, 9, (I), 25-33. -. (198813). “Aging, intelligence and attention.” Psychology and Aging, 3, (2), 59-74. Stankov, L. & J.L. Horn. (1980). “Human abilities revealed through auditory tests.” Journal of Educational Psychology, 72, (1), 19-42. Stanton, W.R. & J.A. Keats. (1986). “Intelligence and ordered task complexity.” Australian Journal of Psychology, 38, (2), 125-131. Sternberg, R.J. (1977). Intelligence, information processing, and analogical reasoning: The componentiul analysis of human abilities. Hillsdale, NJ: Lawrence Erlbaum. Thomson, G.H. (1939). The factorial analysis of human ubility. London: The University of London Press. Vernon, P. (1988). Speed of information processing and intelligence. Hillsdale, NJ: Ablex Publishing. Witten’born, T.R. (1943). “Factorial equations for tests of attention.” Psychometriku, 8, 1935. Zimmerman, W.S. (1954). “The influence of item complexity upon the factor composition of a spatial visualization test.” Educational and Psychological Measurement, 14, 106-119.