Multidimensional item analysis of ability factors in spatial test items

Multidimensional item analysis of ability factors in spatial test items

Personality and Individual Differences 37 (2004) 1003–1012 www.elsevier.com/locate/paid Multidimensional item analysis of ability factors in spatial t...

291KB Sizes 0 Downloads 29 Views

Personality and Individual Differences 37 (2004) 1003–1012 www.elsevier.com/locate/paid

Multidimensional item analysis of ability factors in spatial test items Eva Ullstadius a

a,*

, Berit Carlstedt b, Jan-Eric Gustafsson

a

Department of Education, G€oteborg University, Box 300, Gothenburg 405 30, Sweden b National Defence College, Karlstad, Sweden

Received 13 February 2003; received in revised form 16 October 2003; accepted 17 November 2003 Available online 17 January 2004

Abstract Two strategies for solving spatial test items, drawing differently on general ability and visualization, have been shown. These strategies may not only be a matter of individual predilection, but item features may lend themselves more to one or the other strategy. Newly developed techniques for factor analysis and the Mplus program allows items to be analysed as categorical variables together with summarized test results as continuous variables, which enables analyses of the dimensionality of single test items. Five spatial tests in the Computerized Swedish Enlistment test battery were analysed for a representative sample of 18-year old male conscripts (n ¼ 14; 925). The items were fitted one by one for each of the five spatial tests together with the rest of the tests into a hierarchical model of intellectual abilities with general (G), verbal (Gc0 ), spatial ability (Gv0 ) and test specificity (Tspec0 ) as latent variables. All models showed good fit and the items were found generally to load higher on G than on Gv0 , except for some of the items on the test with limited response time. No systematic increase in G loadings with increasing item difficulty, indicating a shift to an analytical strategy, was revealed. Ó 2003 Elsevier Ltd. All rights reserved. Keywords: Spatial test; Item analysis; Multidimensionality; Confirmatory factor analysis; Visualization

1. Introduction Spatial intelligence (gv) or visualization ability is, in addition to fluid (gf) and crystallized (gc) intelligence, considered an important dimension of most models of mental abilities. Tests aiming

*

Corresponding author. Tel.: +46-31-773-2495; fax: +46-31-773-2070. E-mail address: [email protected] (E. Ullstadius).

0191-8869/$ - see front matter Ó 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.paid.2003.11.009

1004

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

at measuring spatial ability are therefore regularly included in test batteries of general ability (Carroll, 1993; Guilford & Lacey, 1947; Lohman, 1988, 1996). Factor analyses of such test batteries have, however, consistently shown spatial tests to be primarily influenced by a general factor, G, often found to be equal to gf , secondly by a test specific factor, and only thirdly by a factor common to the spatial tests (e.g. Guilford & Lacey, 1947; Gustafsson, 1984; Gustafsson & Balke, 1993; Gustafsson & Undheim, 1996; Lohman, 1988, 1996, 2000; M ardberg & Carlstedt, 1998). The findings of high G loadings on spatial tests can, according to Lohman (1996), be interpreted both as a statistical artefact and as reflecting a psychologically meaningful dimension of general ability that is measured by the spatial tests. The ability factors in test batteries are usually factor analysed in a hierarchical model, in which most of the systematic variance is first extracted as a general factor. Only the residual variance is left for the lower order factors, which may result in decreased Gv0 loadings on the spatial tests. On the other hand, there are reasons to believe that spatial ability may be an important aspect of general ability. Reviewing and integrating modern theory and research on cognition, Lohman (1996) presents a model of general ability in which g is considered to be an ability to create and coordinate different kinds of mental models in working memory. While the solution of cognitive and verbal tasks is dependent on the creation of verbal-sequentially ordered mental representations, spatial tasks require analogue visualizations to be correctly answered. Spatial-analogue visual representations, Lohman claims, are important not only in creative art, but also in abstract problem solving and model building as witnessed by many famous researchers from diverse scientific fields. Accordingly, spatial thinking might contribute important components for imagination that are essential for g. Complicating the problem of measuring gv separated from g, several studies have shown that different strategies exist for solving spatial test items (French, 1965; Kyllonen, Lohman, & Woltz, 1984; Lohman, 2000). Lohman and Kyllonen (1983) demonstrated that rather simple and speeded spatial items tend to be solved by visualization. In contrast, analytic, non-spatial strategies appear to be more frequent with increasing item complexity. On the other hand, complex spatial tasks involving folding, reflection and rotation of a mental image are shown to be the best measure of Gv0 (Lohman, 1988). Furthermore, Lohman and Kyllonen (1983) have shown that subjects tend to solve spatial tasks differently, as ÔverbalizersÕ or ÔvisualizersÕ according to their profile on the latent factors G, Gc0 and Gv0 . Some of them may even change strategy within a test as the items become more difficult. Although difficult to account for, spatial ability has, nevertheless, traditionally been considered crucial in many professions. The existence of ÔverbalizersÕ or ÔvisualizersÕ further stresses this point. For selection practices the improvement of spatial tests into more differentially reliable and valid ones is therefore an important task. Despite the complexity, longitudinal studies have, however, demonstrated that gifted high school studentsÕ spatial test results might be good long time predictors of the selection of both training-courses and future occupations requiring visualization (Humphreys & Lubinski, 1996; Humphreys, Lubinski, & Yao, 1993; Shea, Lubinski, & Benbow, 2001). In factor analytic models the sum of correctly solved items of different types are typically entered as manifest variables. However, this procedure conceals differences between items that might be built into their construction. Sizeable changes in factor loadings have thus been observed as a consequence of relatively small changes of the items (Kyllonen, Lohman, & Snow, 1984). Furthermore, not only tests can be considered multidimensional, but items as well (Ullstadius, Carlstedt, & Gustafsson, 2002) and one way of improving spatial tests would consequently be to

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

1005

address the question of dimensionality at the item level. With newly developed techniques for confirmatory factor analysis of categorical and continuous variables within the same latent variable model, the influence of general and verbal abilities on the performance on each separate item in vocabulary tests has been demonstrated (Ullstadius et al., 2002). This technique for item analysis can be used for studying the dimensionality of spatial test items as well. New possibilities may thereby be offered for examination of item features that lend themselves to an analytical or a visual approach. Such knowledge may, in turn, be used for purposes of test construction. Spatial ability is defined by Lohman (1988, 1996, 2000) as the ability to generate, retain, retrieve and transform well-structured visual images and these aspects of visualization puts demands on working memory. Spatial tests like Surface development, Cube comparison and Block rotation, which require mental folding and rotation of complex figures are, therefore, considered good measures of visualization. Four such spatial tests are included in the Swedish Enlistment Battery (CAT-SEB) to give a measure of a general visualization factor (Gv0 ). Factor analyses of the structure of CAT-SEB on the basis of the hierarchical model suggested by Gustafsson and Balke (1993) have shown that, in addition to moderate loadings on Gv0 (range ¼ 0.15–0.47), the tests have loadings on G that are much higher (range ¼ 0.55–0.67) and test specific factor loadings of intermediate size (range ¼ 0.36–0.52) (M ardberg & Carlstedt, 1998). The aim of the present study is to apply the new technique for item analysis on the spatial test items of the Swedish Enlistment Battery (CAT-SEB) in order to analyse item dimensionality on G, Gv0 and test specific factors, Tspec0 . Specifically, the relationship between G, Gv0 and Tspec0 loadings on each item and item difficulty will be examined.

2. Method 2.1. Materials The computerized Swedish Enlistment Battery (CAT-SEB) consists of 10 tests: two non-verbal reasoning tests (Figure series, ÔSeÕ, and Groups, ÔGrÕ), three verbal tests (Synonyms 1, ÔSy1Õ, Synonyms 2, ÔSy2Õ, and Opposites, ÔOppÕ) and five spatial tests (Block rotation, ÔBlÕ, Metal folding, ÔMfÕ, Dice 1, ÔDi1Õ, Dice 2, ÔDi2Õ, and Technical comprehension, ÔTCÕ). The items in the Block rotation test (20 items) present a three-dimensional target object, and the task is to select the identical three-dimensional rotated object out of five. The Metal folding test (16 items) is a kind of Surface development test in which a drawing of an unfolded piece of metal is presented and the task is to find the three-dimensional object out of four which corresponds to the two-dimensional drawing. The response time for each item is limited to the 80th percentile of the response time calculated on the results from previous tested groups of recruits. The Dice 1 and Dice 2 (20 items each) tests are two parallel cube comparison tests. Two cubes, on which three surfaces are visible, are presented. According to the instructions there is a unique symbol on each side of the cube, and on identical cubes, the symbols are placed in the same relation to each other. The task is to find out if the two cubes, if turned, could be identical, or if they are different. Dice 2 differs from Dice 1 in that, like in Metal folding, a time limit is added for each item. The Technical comprehension test, finally, includes 16 items with problems that require knowledge of technical and physical principles, and one out of three suggested solutions should be selected as the correct

1006

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

one. These five tests have been shown to load on General visualization, Gv0 . The Technical comprehension test has a loading on Gc0 as well, and there is overlap also between the test specific components of Dice 1 and Dice 2 (M ardberg & Carlstedt, 1998). 2.2. Participants A representative sample of 18-year old male conscripts (n ¼ 14; 925) was tested. The sample is thus homogeneous with respect to sex and age. 2.3. Model The analyses aiming at examining the dimensionality of the spatial items are based on the hierarchical nested factor model applied in earlier analyses of the CAT-SEB by M ardberg and Carlstedt (1998). First a general factor (G) is allowed to capture variance in all the tests. In the next step two broad factors, Gc0 and Gv0 are introduced, also directly influencing tests of verbal and spatial content respectively. These two factors are nested under the general factor. They are identified by the prime 0 to emphasize that they are influencing the residual variance remaining after extracting the G-variance. Nested within the Gc0 and Gv0 factors, narrow test specific factors may be influencing what subsequently remains of the variance of the test results. In order to allow estimation of the test specific variances, the 10 subtests in CAT-SEB were divided into halves, one consisting of the odd items, the other of the even ones, resulting in a total of 20 variables. The two reasoning tests Figure series and Groups are assumed to measure only G. For Technical comprehension, a loading on Gc0 is assumed in addition to G and Gv0 . In order to examine the multidimensionality, each spatial item is set to be directly influenced by G and Gv0 . Test specific factors are assumed for each of the five spatial tests and for Groups. The model is presented in Fig. 1. 1 2.4. Analyses The modeling environment STREAMS 2.1 (Gustafsson & Stahl, 1999) was used for model specifications, using Mplus 2.0 as model fitting program. In addition to an ordinary analysis of covariance matrices, which assumes variables to be continuous, and regressions of observed variables on latent variables to be linear, the item variables were treated as categorical variables in the Mplus program (Muthen & Muthen, 1998). The model, which thus included both continuous and categorical variables, was estimated with the Weighted Least Squares estimator. A preliminary analysis revealed that when all the separate item scores were included as variables, rather than the summed score for the odd and even subscales, other factor loadings in the model were strongly affected. The items in each scale were therefore analysed one by one by according to the following procedure: Each item score was subtracted from the odd or even subscale and entered separately into G and Gv0 . They also formed a test specific factor (Tspec0 ) together with the odd and even subscales. This procedure thus allows the test specificity to be estimated on item level.

1

The test specific factors are not shown on Fig. 1.

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

1007

Sy1-odd Sy1-even Bl-odd Bl-even Se-odd Se-even Opp-odd Opp-even

Gc’

Mf-odd Mf-even Gr-odd

G

Gr-even Di1-odd Di1-even Di2-odd

Gv’

Di2-even TC-odd TC-even Sy2-odd Sy2-even

Item1

Item2

Item3

ItemN

Fig. 1. A hierarchical factor model with a G factor influencing all the tests and two nested factors, Gc0 and Gv0 influencing verbal and spatial tests, respectively. Technical comprehension is assumed to load on Gc0 in addition to G and Gv0 . Both G and Gv0 directly influence each spatial item. The test specific factors are not shown.

Considering the large influence of test specificity in spatial tests this method might be advantageous.

3. Result The results for the 92 models 2 (one for each item in each of the five spatial tests) all showed a good fit (RMSEA ¼ 0.026–0.046) with the proposed hierarchical model.

2

Item 06 was later excluded.

1008

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

The range of loadings on G and Gv0 on scale level in the present study (G ¼ 0.53–0.68; ardberg and Gv0 ¼ 0.17–0.41) were practically identical to those obtained in the study of M Carlstedt (1998) (G ¼ 0.55–0.67; Gv0 ¼ 0.15–0.47). The loadings on the test specific factors varied, however, between 0.34 and 0.69 to be compared to 0.36 and 0.52 in the M ardberg and Carlstedt study. The higher loadings in the present study were reached on the odd and even subscales of Dice 1 and Dice 2. Item difficulty (p) and item loadings on G, Gv0 and Tspec0 were plotted for each of the five spatial tests and the items were ordered according to increasing difficulty. In addition, the intercorrelations between item difficulty, G and Gv0 and Tspec0 were calculated. The results were examined for each test separately. Metal folding (Fig. 2) appeared to be a difficult test, with a p mean of only 0.40 and no really easy items (range ¼ 0.13–0.71). The limited response time most likely contributed to the relatively high level of difficulty. With one exception of equal loadings, all the items loaded higher on G (range ¼ 0.14–0.61, mean ¼ 0.46) than on Gv0 (range ¼ 0.09–0.35, mean ¼ 0.24). Easy items in particular showed substantial loadings on Tspec0 (range ¼ 0.05–0.49, mean ¼ 0.29) all of which, however, were lower than on G and most of which exceeded those on Gv0 . Increasing item difficulty, p, tended generally to result in lower loadings as revealed by positive correlations with G (r ¼ 0:57 , p < 0:05), Gv0 (r ¼ 0:58 , p < 0:05) and Tspec0 (r ¼ 0:68 , p < 0:05). Moreover, the G and Gv0 loadings were highly inter-correlated (r ¼ 0:85 , p < 0:05), as were the G and Tspec0 loadings (r ¼ 0:84 , p < 0:05). No shift in measuring more G in relation to Gv0 with increasing item difficulty was revealed. Instead, one rather difficult item and five items of intermediate difficulty had the highest loadings on Gv0 . One item had negligible, but significant, loadings on G, Gv0 and Tspec0 and should therefore be excluded from the scale in future testing. The Block rotation (Fig. 3) test turned out to be of medium difficulty (p mean ¼ 0.56) with a restricted range (p ¼ 0.35–0.84). Most of the items were thus of intermediate difficulty and all of them loaded higher on G (range ¼ 0.40–0.58, mean ¼ 0.51) than on Gv0 (range ¼ 0.19–0.31 mean ¼ 0.24) and on Tspec0 (range ¼ 0.21–0.36, mean ¼ 0.27). No shifts to relatively higher G loadings with increasing item difficulty appeared and only one correlation was found between p and Tspec0 (r ¼ 0:63 , p < 0:05). The curves showing the loadings Gv0 and Tspec0 followed each other rather closely, with a tendency towards higher Tspec0 for easy items. Metal folding 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

p G Gv´ Tspec´

Fig. 2. Metal folding: G, Gv0 and Tspec0 loadings and item difficulty (p) ordered in increasing difficulty.

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

1009

Block rotation 0.9 0.8 0.7 0.6

p

0.5

G

0.4

Gv'

0.3

Tspec´

0.2 0.1 0

Fig. 3. Block rotation: G, Gv0 and Tspec0 loadings and item difficulty (p) ordered in increasing difficulty.

Dice 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

p G Gv' Tspec´

Fig. 4. Dice 1: G, Gv0 and Tspec0 loadings and item difficulty (p) ordered in increasing difficulty.

Dice 1 (Fig. 4) was apparently easy (range ¼ 0.27–0.92, mean ¼ 0.68). However, considering the yes/no response alternatives and a subsequent guessing rate of 50% the level of difficulty was reasonable. Like the preceding tests, all loadings on G (range ¼ 0.11–0.61, mean ¼ 0.47) exceeded those of Gv0 (range ¼ 0.03–0.27, mean ¼ 0.14) and Tspec0 (range ¼ 0.11–0.51, mean ¼ 0.25), with the exception of one item with generally low factor loadings. Except for two items with Gv0 loadings above 0.20, the Gv0 loadings were low to negligible, although significant. In contrast, the loadings on Tspec0 varied considerably, and while the G and Gv0 loadings were unrelated to item difficulty, the Tspec0 loadings were negatively correlated with p (r ¼ 0:59 , p < 0:05). More difficult Dice 1 items therefore seemingly required more of the test specific factor in order to be solved. One item with low loadings on all ability factors should be excluded from future testing. The time limit in the parallel test Dice 2 (Fig. 5) resulted in a somewhat narrower range of values of p (range ¼ 0.36–0.80), a slightly increased level of difficulty (mean ¼ 0.64) and more items of intermediate difficulty compared to Dice 1. The item loadings on G for Dice 2 varied considerably (range ¼ 0.16–0.55, mean ¼ 0.34) as they did on Gv0 (range ¼ 0.04–0.31, mean ¼ 0.19) with a mean value on G that was lower than the corresponding value for Dice 1 and a somewhat higher Gv0 mean. Nine of the 19 items in Dice 2 had G loadings lower than 0.30, compared to only one in Dice 1. Moreover, six of these nine items had relatively high Gv0 loadings (0.20 or higher). Among the ten items with G loadings above 0.30, only three had correspondingly high Gv0 loadings. High G loadings thus tended to correspond to low Gv0 loadings and vice versa. The loadings on Tspec0 , in contrast, varied less than in

1010

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

1 0.9

Dice 2

0.8 0.7

p

0.6 0.5 0.4

G Gv' Tspec´

0.3 0.2 0.1 0

Fig. 5. Dice 2:

3

G, Gv0 and Tspec0 loadings and item difficulty (p) ordered in increasing difficulty. Technical Comprehension 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

p G Gv´ Tspec´

0.1 0

Fig. 6. Technical comprehension: G, Gv0 and Tspec0 loadings and item difficulty (p) ordered in increasing difficulty.

Dice 1 (range ¼ 0.10–0.37, mean ¼ 0.22) and the G and Tspec0 curves followed each other rather closely (r ¼ 0:81 , p < 0:05). Like in Dice 1, Tspec0 was negatively correlated to p (r ¼ 0:52 , p < 0:05). In addition to item 06 that was excluded from the analyses, one more item appeared to have little impact on the scale. In future testing these two items should be substituted. The Technical comprehension test, finally, was also rather easy (range ¼ 0.23–0.92, mean ¼ 0.62) with only four of the 16 items being solved by less than 50% of the participants (Fig. 6). Again, all items loaded higher on G (range ¼ 0.26–0.54, mean ¼ 0.43) than on Gv0 (range ¼ 0.03–0.24, mean ¼ 0.11) with the Tspec0 loadings in between (range ¼ 0.09–0.32, mean ¼ 0.20). A positive correlation between the G loadings and item difficulty was found (r ¼ 0:72 , p < 0:001). Low loadings on the Gc0 factor (range 0.01–0.26, mean ¼ 0.10) also appeared, for clarity not shown in Fig. 6. 4. Discussion Like the results obtained with scale level scores in earlier research, for the majority of the items substantial G loadings, low to moderate Gv0 loadings and Tspec0 loadings in between, were found. 3

Item 06 had a negative loading on Gv0 and was excluded. The negative loading was probably the result of an unintended distortion of the item when transformed from a paper-and-pencil test to a computerized one.

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

1011

The analyses on item level did, however, reveal considerable variation in the amount of G, Gv0 and Tspec0 variance between tasks, particularly in Dice 2 but also in Metal folding and Dice 1, differences that in a summed scale score might otherwise be concealed. While complex tasks requiring folding and rotation are generally considered to be the best measures of visualization (Lohman, 1988, 1996, 2000; M ardberg & Carlstedt, 1998), items of higher complexity have also been found to be more susceptible to an analytical strategy than simpler ones, which to a greater degree are visually solved (Lohman & Kyllonen, 1983). The tasks in the spatial tests in CAT-SEB involve items of considerable complexity, which, unlike those in the study of Lohman and Kyllonen (1983), are not systematically varied. However, if it is loosely assumed that item difficulty approximately reflects complexity in the CAT-SEB tests, an increase of the G loadings with increasing item difficulty and correspondingly lower Gv0 loadings would indicate a strategy shift. Such a strategy shift should appear as approaching or crossing curves for the G and Gv0 loadings with increasing item difficulty (Figs. 2–6). No tendencies of this kind were seen among the curves. Moreover, easy items or items of intermediate difficulty loaded high on G in Metal folding and Technical comprehension as revealed by the correlations between G loadings and p. In other words, increasing complexity in terms of item difficulty seemed not specifically to promote an analytical approach. To increase the amount of Gv0 variance, limiting the response time seems to be more effective as shown by the changes in the proportions of G and Gv0 for some items in Dice 2 compared with the un-speeded Dice 1. The G loadings dropped considerably on nine of the 19 items, with a subsequent increase of the Gv0 loadings for six of them. No such effect was seen in the parallel Dice 1 test. Apparently, the time limit in Dice 2 favoured a visual analogue solution of the items. A similar time limit was also applied to the Metal folding test but here the Gv0 and G loadings were positively correlated. There were five items with loadings above 0.30 on Gv0 that had high loadings also on G (range ¼ 0.54–0.61) and G thus seemed to be required for solving the items high on Gv0 . Several explanations of the conflicting results can be suggested. Both Dice 2 and Metal folding contain complex spatial tasks but while Metal folding requires a choice between four different alternatives, Dice 2 items are solved by yes/no answers. The time limit arbitrarily set at the 80th percentile might favour a rapid visual analogue strategy for solving items in Dice 2, but examining the four response alternatives in Metal folding might require comparisons involving not only a well formed visual image but also a greater amount of reasoning. A test specific factor is the joint effect of features specific for a particular test, such as test instruction, task and response format. The test specific factor may also contain specific components of visualization that are captured by the general visualization factor common to all the tests. Metal folding thus require folding while Block rotation and the Dice tests require rotation. As long as these miscellaneous components cannot be disentangled, detailed interpretation of results on the Tspec0 on item level will remain unattainable. Although there are good reasons to consider the ability to create abstract mental images to be essential aspects of general ability (Lohman, 1988, 1996, 2000), when it comes to personnel selection and finding good visualizers there are practical reasons for trying to separate gv from g. For this purpose improvement of the measurement and testing of spatial abilities is important. The new technique enabling analyses of the amount of variance due to G, Gv0 and test specificity in each item presented here, offers a method for item analysis that can guide further test construction. Items with high loadings on Gv0 can be selected and examined for distinguishing

1012

E. Ullstadius et al. / Personality and Individual Differences 37 (2004) 1003–1012

features. Another approach for test improvement is suggested by the results of the analyses of the CAT-SEB indicating that the response time may play a crucial role for the relation between the G, Gv0 and Tspec0 components in an item. By examining different response latencies for the items in the spatial tests, an optimal level of response time might be obtained. While the new technique allows potentially useful analyses, the robustness and stability of the particular results certainly need to be confirmed by further studies.

References Carroll, J. B. (1993). Human cognitive abilities. A survey of factor-analytic studies. Cambridge: University Press. French, J. W. (1965). The relationship of problem-solving styles to the factor composition of tests. Educational and Psychological Measurement, 19, 469–496. Guilford, J. P., & Lacey, J. I. (1947). Printed classification tests. Army Air Force Aviation Psychology Program Research Reports, No. 5. Washington, DC: U.S. Government Printing Office. Gustafsson, J.-E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8, 179–203. Gustafsson, J.-E., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28, 407–434. Gustafsson, J.-E., & Stahl, P.-A. (1999). STREAMS users guide. Version 2.1 for Windows. M€ olndal, Sweden: MultivariateWare. Gustafsson, J.-E., & Undheim, J. O. (1996). Individual differences in cognitive functions. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 186–242). New York: Simon & Schuster Macmillan. Humphreys, L. G., & Lubinski, D. (1996). Assessing spatial visualization. An underappreciated abilities for many schools and work settings. In C. P. Benbow & D. J. Lubinski (Eds.), Intellectual talent. Psychometric and social issues. Baltimore: John Hopkins. Humphreys, L. G., Lubinski, D., & Yao, G. (1993). Utility of predicting group membership and the role of spatial visualization in becoming an engineer, physical scientist, or artist. Journal of Applied Psychology, 78(2), 250–261. Kyllonen, P. C., Lohman, D. F., & Snow, R. E. (1984). Effects of aptitudes, strategy training, and task facets on spatial task performance. Journal of Educational Psychology, 74, 130–145. Kyllonen, P. C., Lohman, D. F., & Woltz, D. J. (1984). Componential modelling of alternative strategies for performing spatial tasks. Journal of Educational Psychology, 76, 1325–1345. Lohman, D. F. (1988). Spatial abilities as traits, processes, and knowledge. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 4, pp. 181–248). Hillsdale, NJ: Lawrence Erlbaum Associates. Lohman, D. F. (1996). Spatial ability and g. In I. Dennis & P. Tapsfield (Eds.), Human abilities. Their nature and measurement (pp. 97–116). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers. Lohman, D. F. (2000). Complex information processing and intelligence. In R. J. Sternberg (Ed.), Handbook of intelligence (pp. 285–339). USA: Cambridge University Press. Lohman, D. F., & Kyllonen, P. C. (1983). Individual differences in solution strategy on spatial tasks. In R. F. Dillon & R. R. Schmeck (Eds.), Individual differences in cognition (pp. 105–135). New York: Academic Press. M ardberg, B., & Carlstedt, B. (1998). Swedish enlistment battery (SEB): construct validity and latent variable estimation of cognitive abilities by the CAT-SEB. International Journal of Selection and Assessment, 6, 107–114. Muthen, L. K., & Muthen, B. O. (1998). Mplus userÕs guide. Los Angeles, CA: Muthen & Muthen. Shea, D. L., Lubinski, D., & Benbow, C. P. (2001). Importance of assessing spatial ability in intellectual talented young adolescents: a 20-year-old longitudinal study. Journal of Educational Psychology, 93, 604–614. Ullstadius, E., Carlstedt, B., & Gustafsson, J.-E. (2002). Influence of general and crystallized intelligence on vocabulary test items. European Journal of Psychological Assessment, 18, 78–84.