Journal of School Psychology 50 (2012) 113–128
Contents lists available at SciVerse ScienceDirect
Journal of School Psychology journal homepage: www.elsevier.com/locate/ jschpsyc
The effects of Wechsler Intelligence Scale for Children—Fourth Edition cognitive abilities on math achievement Jason R. Parkin a,⁎, A. Alexander Beaujean b a b
Special School District of St. Louis County, 12110 Clayton Rd, Town & Country, MO 63131, USA Baylor Psychometric Laboratory, Baylor University, One Bear Place #97301, Waco, TX 76798-7301, USA
a r t i c l e
i n f o
Article history: Received 2 March 2009 Received in revised form 24 August 2011 Accepted 26 August 2011 Keywords: WISC-IV WIAT-II Math achievement CHC SEM Cognitive testing
a b s t r a c t This study used structural equation modeling to examine the effect of Stratum III (i.e., general intelligence) and Stratum II (i.e., Comprehension-Knowledge, Fluid Reasoning, Short-Term Memory, Processing Speed, and Visual Processing) factors of the Cattell– Horn–Carroll (CHC) cognitive abilities, as operationalized by the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV; Wechsler, 2003a) subtests, on Quantitative Knowledge, as operationalized by the Wechsler Individual Achievement Test, Second Edition (WIAT-II; Wechsler, 2002) subtests. Participants came from the WISCIV/WIAT-II linking sample (n = 550). We compared models that predicted Quantitative Knowledge using only Stratum III factors, only Stratum II factors, and both Stratum III and Stratum II factors. Results indicated that the model with only the Stratum III factor predicting Quantitative Knowledge best fit the data. © 2011 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
1. Introduction The Cattell–Horn–Carroll (CHC; McGrew, 2005; Newton & McGrew, 2010) theory of cognitive abilities is becoming an increasingly influential method of psychometric test development and interpretation. As a result, there is a need to understand how the CHC-defined cognitive abilities relate to academic achievement. To date, much of the prior research in this area has used the Woodcock–Johnson (WJ) instruments (Woodcock & Johnson, 1989; Woodcock, McGrew, & Mather, 2001) to operationalize CHC abilities, with McGrew and Wendling (2010) estimating that WJ instruments have been used in approximately 94% of the studies.
⁎ Corresponding author. Tel.: + 1 254 710 1548; fax: + 1 254 710 3265. E-mail addresses:
[email protected] (J.R. Parkin),
[email protected] (A.A. Beaujean). ACTION EDITOR: Randy Floyd. 0022-4405/$ – see front matter © 2011 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.jsp.2011.08.003
114
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
Given that many cognitive ability instruments purportedly measure CHC abilities, but that these abilities are not necessarily as exchangeable as they may appear when compared across instruments (Floyd, Bergeron, Mccormack, Anderson, & Hargrove-Owens, 2005), more research from diverse cognitive batteries is needed to better understand the CHC cognitive and academic relations. In their meta-analysis of CHC factors and academic achievement McGrew and Wendling (2010) go so far as to state the following: Until additional CHC COG-ACH research is completed with other (non-WJ) intelligence batteries, users of these other batteries must proceed with caution when forming COG-ACH relations-based diagnostic, interpretative, and intervention hypotheses. (p. 668) Consequently, this paper adds to the CHC knowledge base by examining the effects of CHC cognitive abilities, operationalized by the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV; Wechsler, 2003b), on Quantitative Knowledge operationalized by the Wechsler Individual Achievement Test, Second Edition (WIAT-II; Wechsler, 2002). 1.1. Cattell–Horn–Carroll theory of cognitive abilities CHC theory has become a major influence in modern day test development, starting with the Woodcock– Johnson III (WJ III; Woodcock et al., 2001) and extending to the fifth edition of the Stanford–Binet Intelligence Scales (Roid, 2003), the second edition of the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 2004), and the second edition of the Differential Abilities Scales (Elliot, 2007). Besides influencing test battery development, CHC theory has also influenced the interpretation of cognitive ability instruments, most notably through the practice of Cross-Battery Assessment (Flanagan, Ortiz, & Alfonso, 2007). CHC theory (McGrew, 1997, 2005) conceptualizes cognitive abilities as a higher-order taxonomy, reflecting an integration of the Gf–Gc theory (Horn & Cattell, 1966) and Carroll's (1993) three-stratum theory. Stratum III in CHC's taxonomy represents general cognitive ability (g; Jensen, 1998). Traditionally, at Stratum II there were 10 broad abilities (McGrew, 2005), although Newton and McGrew (2010) have expanded it to 16 “human abilities”: Fluid Reasoning, Comprehension-Knowledge, General (domain-specific) Knowledge, Quantitative Knowledge, Reading and Writing, Short-Term Memory, Visual Processing, Auditory Processing, Long-term Storage and Retrieval, Processing Speed, Reaction and Decision Speed, Psychomotor Speed, Psychomotor Abilities, Olfactory Abilities, Tactile Abilities, and Kinesthetic Abilities. Stratum I includes over 100 narrow abilities, subsumed under each of the broader 16.1 1.1.1. Effects of Cattell–Horn–Carroll cognitive constructs on Quantitative Knowledge One of the CHC Stratum II abilities that represents well one of the major curriculum areas in schoolings is Quantitative Knowledge, the “breadth and depth of a person's acquired store of declarative and procedural quantitative or numerical knowledge” (Newton & McGrew, 2010, p. 628). Multiple studies have investigated the CHC cognitive abilities that predict scores or abilities in this domain (for a review, see McGrew & Wendling, 2010), using a variety of data analysis techniques. For example, McGrew and Hessler (1995) and Floyd, Evans, and McGrew (2003) both used multiple regression with WJ cognitive and achievement instruments to investigate how well Stratum II scores explain Quantitative Knowledge. They found that Processing Speed, Comprehension-Knowledge, and Fluid Reasoning scores demonstrate moderate-to-strong effects on Quantitative Knowledge scores from childhood to adult age, although Floyd and colleagues also reported a moderate effect for Short-Term Memory scores. Proctor, Floyd, and Shaver (2005) used profile analysis to compare the cognitive ability scores of children with low math reasoning or calculation scores to the cognitive ability scores of their peers without such low scores. They observed no major differences on the cognitive ability scores between those with low and average math reasoning scores, they did find that children with problem-solving deficits demonstrated lower scores on measures of Fluid Reasoning and Comprehension-Knowledge, as well lower scores on an aggregated measure of cognitive ability. Structural equation modeling (SEM; Bollen, 1989) is a third way researchers have studied the effects of Strata II and III factors on Quantitative Knowledge. SEM has a number of 1 There is much variety in the nomenclature used to describe the CHC abilities, especially those in Stratum II. Consistent with Newton and McGrew' (2010) goal of standardizing this nomenclature, we have used their terminology to label the CHC abilities.
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
115
advantages over other types of data analysis methods most notably its use of latent variables to extract measurement error from the variables' variance, and the multiple indices available to assess how a model fits a data set (Keith, 2006). In addition, it allows one to compare the effects of Stratum II and Stratum III abilities in the same model rather than doing separate analysis, which is what is required when making such comparisons using multiple regression (McGrew, Keith, Flanagan, & Vanderwood, 1997). Keith (1999), McGrew et al. (1997), and Taub, Floyd, Keith, and McGrew (2008) have all used SEM to operationalize CHC abilities and analyze its effects on Quantitative Knowledge. Although g typically has typically been shown to account for the largest portion of variance in Quantitative Knowledge, select Stratum II factors were still able to account for additional variance, most notably factors representing Processing Speed, Fluid Reasoning, and, in some instances, Short-Term Memory. Interestingly, both McGrew et al. and Taub et al. found that the effect of g on Quantitative Knowledge was indirect, working through Stratum II factors. The present study seeks to expand research in this area by testing the findings yielded from analysis using WJ instruments with data collected on the WISC-IV and WIAT-II. More specifically, it tests whether latent Stratum II and III factors, constructed from WISC-IV subtests, demonstrate the same effects on a Quantitative Knowledge factor, constructed from WIAT-II subtests, as have been described by research using the Woodcock–Johnson instruments. 1.1.2. Applying Cattell–Horn–Carroll theory to the Wechsler Intelligence Scale for Children Although the Wechsler intelligence tests have never been developed using a CHC framework (Kaufman, Flanagan, Alfonso, & Mascolo, 2006), researchers have subsequently conceptualized the instruments from a CHC standpoint (Flanagan, McGrew, & Ortiz, 2000; Keith & Reynolds, 2010). For example, Flanagan (2000) combined subtests from the revised edition of the WISC (WISC-R; Wechsler, 1974) and the WJ-R (Woodcock & Johnson, 1989) to fit the CHC model. Using a SEM model, she found that supplementing the WISC-R with the WJ-R in the areas of Fluid Reasoning, Long-Term Retrieval, Auditory Processing, and Short-Term Memory resulted in a 25-percent increase in the amount of variance explained in a reading achievement factor compared to use of only the original WISC-R subtests and factors. Likewise, Hale, Fiorello, Kavanagh, Hoeppner, and Gaither (2001) analyzed the Wechsler Intelligence Scale for Children—Third Edition (WISC-III; Wechsler, 1991) from a CHC perspective, and found that the WISC-III subtests measuring Comprehension-Knowledge, Short-Term Memory and, to a lesser extent, Visual Processing factors were related to math achievement— although they did so using commonality analysis, which some have argued is a flawed data analysis technique in this context (Schneider, 2008). More recently, researchers have examined the factor structure of the WISC-IV (Wechsler, 2003a) using the CHC hierarchy. Using the scale's normative data, Keith, Fine, Taub, Reynolds, and Kranzler (2006) compared a CHC factor structure to the traditional four-factor structure prescribed in the WISC-IV test manual (Wechsler, 2003b), and they found that the CHC factor structure fit the data better than the traditional model. In operationalizing CHC theory with the WISC-IV, Keith et al. made considerable changes to the Perceptual Reasoning Index by dividing it into two separate CHC domains: Visual Processing (formed from the Block Design and Picture Completion subtests) and Fluid Reasoning (formed from the Matrix Reasoning and Picture Concepts subtests). Although the use of a single Perceptual Reasoning factor versus two CHC factors (i.e., Visual Processing and Fluid Reasoning) might not make much difference if the achievement area of interest was reading or oral language, its use could make a difference when the area of interest is mathematics. Studies have found strong a strong effect for Fluid Reasoning on Quantitative Knowledge (e.g., Floyd, Shaver, & McGrew, 2003), and Carroll (1993), whose work much of CHC theory is built upon, considered math ability, or at least Quantitative reasoning, to be a subset of the Fluid Reasoning domain. The effect of Visual Processing on Quantitative Knowledge is more ambiguous, though. While some argue that a relationship exists between the two constructs (e.g., Geary, 1994), others argue that the relationship is negligible, except in gifted/high achieving populations (Friedman, 1995). 1.2. Current study The purpose of the present study is twofold. First, it is to examine the effects of CHC cognitive abilities on Quantitative Knowledge ability using the WISC-IV (Wechsler, 2003b) and WIAT-II (Wechsler, 2002). Second, it is to compare how our results fit into the broader CHC–academic achievement research agenda,
116
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
which predominantly uses WJ instruments. Analyses using the WJ instruments have consistently highlighted that g is a strong predictor of Quantitative Knowledge, but so are the Stratum II abilities such as Processing Speed, Fluid Reasoning, Comprehension-Knowledge, and, sometimes, Short-Term Memory. Because the WISC-IV measures all of these Stratum II abilities, we hypothesize that the WISC-IV/WIAT-II operationalized CHC abilities should demonstrate similar relations as found in research using the WJ batteries. More specifically, we hypothesize the following: 1. Fluid Reasoning and Comprehension-Knowledge should strongly and positively predict Quantitative Knowledge; Processing Speed and Short-Term Memory should positively predict Quantitative Knowledge, too, but at a lesser magnitude. 2. Fluid Reasoning and Comprehension-Knowledge should predict Quantitative Knowledge, even after including g in the model, and their contribution should be more than the contribution of Processing Speed and Short-Term Memory. 3. The influence of g on Quantitative Knowledge is either only indirect, or mostly indirect, funneling most of its predictive effects through Fluid Reasoning and Comprehension-Knowledge. 1.2.1. Specific hypotheses concerning the Arithmetic and Math Reasoning subtests One of the WISC-IV subtests is Arithmetic, a subtest carried over from previous editions of the WISC (Kaufman et al., 2006). Keith et al. (2006) classified the subtest as a measure of Fluid Reasoning, and the WISC-IV interpretive guidelines (Wechsler, 2003a) includes it as a measure of Working Memory (a narrow ability associated with the CHC ability Short-Term Memory; Newton & McGrew, 2010). Others, however, have found that when the Arithmetic test is factor analyzed with measures of Quantitative Knowledge, it has larger factor loadings on a Quantitative Knowledge factor than on a Fluid Reasoning factor (Phelps, McGrew, Knopik, & Ford, 2005; Woodcock, 1990). The WIAT-II (Wechsler, 2002) Math Reasoning subtest presents similar concerns. While it was designed to measure math ability, some experts in cognitive assessment (Flanagan, Ortiz, Alfonso, & Mascolo, 2006) consider it more a measure of Fluid Reasoning than of Quantitative Knowledge. Due to the complication associated with these subtests, it is critical to investigate where these two tasks are located in the CHC taxonomy to understand how cognitive abilities may explain math achievement. Based on the reviewed research, we hypothesize that when analyzing the WISC-IV and WIAT-II math-related subtests, the model that best fits the data will have both the Arithmetic and Math Reasoning subtests specified as measures of both the Quantitative Knowledge factor and the Fluid Reasoning factor. 2. Method 2.1. Participants The participants for this study (n = 550) came from the standardization linking sample for the WISC-IV (Wechsler, 2003a) and WIAT-II (Wechsler, 2002). The participants' ages ranged from 6 to 16 years (M = 11.58, SD = 3.22), and the sample is nationally representative within ±5% of the 2000 U.S. Census on the variables of age, gender, race/ethnicity, region of country, and parent education level (Wechsler, 2003b). There were approximately equal numbers of males (n = 282) and females (n = 268), with 334 of them being Caucasian (60.72%), 101 being Hispanic (18.36%), 86 being African American (15.36%), and the other 29 being Asian, Native Americans, and “Other” (5.27%). Parent education levels included 8 to 11 years (n = 102), 12 years (n = 145), 13 to 15 years (n = 172), and 16 years or more (n = 131) of education. Most of the participants took the WISC-IV first and subsequently took the WIAT-II between 0 and 39 days later (M = 12 days; Wechsler, 2003b). Some data were missing for 280 of the participants, ranging from a single score (80%) to scores on an entire instrument (n = 1), but 98% of the participants with missing data had less than five scores missing. No discernible pattern could be found in those who had missing responses, but it is doubtful that they were missing completely at random (MCAR; Little & Rubin, 2002). Consequently, the study's statistics were calculated in Mplus using full information maximum likelihood (FIML; Enders & Bandalos, 2001). Unlike more traditional methods for treating missing data, such as listwise or pairwise deletion, FIML uses all the information from a given respondent, even if the respondent only has scores on a subset of
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
117
variables. It estimates each parameter in the model with all the information it can use from the original set of respondents instead of discarding those respondents who do not have values for all used variables. When the data are not MCAR, FIML produces more accurate and precise parameter estimates than pairwise or listwise deletion (Allison, 2001). 2.2. Instruments This study initially employed scores from all 15 subtests from the WISC-IV and the Math Reasoning and Numerical Operations subtests from the WIAT-II as indictors for the SEM models. Descriptive statistics for the WISC-IV subtest scaled scores and the WIAT-II subtest standard scores are provided in Table 1. Correlations among subtests, internal consistency reliability coefficients for most subtests, and test–retest reliability coefficients for the WISC-IV Coding, Symbol Search, and Cancellation subtests are listed in Table 2 (Williams, Weiss, & Rolfhus, 2003a, 2003b). 2.3. Analysis All analyses were conducted in Mplus (Muthén & Muthén, 2008) using the subtests' age-based standardized scores for input. 2.3.1. Models First, we reproduced the traditional four-factor model (model A in Table 3) and the five-factor model (model B, see Fig. 1) offered by Keith et al. (2006). The traditional four-factor model (model A) used scores from the Similarities, Vocabulary, Comprehension, Information, and Word Reasoning subtests as indicators of the Verbal Comprehension factor. Scores from the Block Design, Picture Concepts, Matrix Reasoning and Picture Completion subtests were used as indictors of the Perceptual Reasoning factor. Scores from the Digit Span, Letter–Number Sequencing, and Arithmetic subtests were used as indicators of the Working Memory factor, and scores from the Coding, Symbol Search, and Cancellation subtests were used as indicators of Processing Speed factor. The covariance among the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed factors was then used to model a higher-order g factor. Table 1 Descriptive statistics for Wechsler Intelligence Scale for Children—Fourth Edition and Wechsler Individual Achievement Test—Second Edition subtests. Instrument
WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WISC-IV WIAT-II WIAT-II
Classification Manual
CHC
VC VC VC VC VC PR PR WM PR PR WM WM PS PS PS – –
Gc Gc Gc Gc Gc Gf, Gv Gf Gf Gv Gv, Gc Gsm Gsm Gs, Gv Gs Gs Gq Gq
Manifest variable
Mean
SD
Kurtosis
Skewness
Information Vocabulary Similarities Word Reasoning Comprehension Matrix Reasoning Picture Concepts Arithmetic Block Design Picture Completion Digit Span Letter–Number Seq. Symbol Search Coding Cancellation Numerical Operations Math Reasoning
10.13 10.02 9.80 10.17 10.22 9.83 9.95 10.19 9.68 9.69 9.92 10.04 10.05 10.18 10.03 101.75 100.62
3.01 3.02 3.03 2.99 2.88 2.89 2.98 2.85 2.90 2.92 2.98 3.00 3.12 2.96 3.12 15.85 16.13
− 0.2 − 0.2 − 0.3 0.2 0.1 − 0.1 0.4 − 0.6 0.1 0.2 0.2 0.9 0.7 0.0 − 0.1 0.0 0.1
− 0.0 − 0.2 − 0.1 − 0.2 − 0.2 0.2 − 0.5 0.1 − 0.2 − 0.1 − 0.0 − 0.7 − 0.4 0.2 0.0 − 0.2 − 0.3
Note. WISC-IV: Wechsler Intelligence Scale for Children—Fourth Edition; WIAT-II = Wechsler Individual Achievement Test—Second Edition; FSIQ = Full Scale IQ; VC = Verbal Comprehension; PR = Perceptual Reasoning; WM = Working Memory; PS = Processing Speed. Gf = Fluid Reasoning; Gc = Comprehension-Knowledge; Gsm = Short-Term Memory Gv = Visual Processing; Gs = Processing Speed. CHC = subtest classification by Keith et al. (2006); Manual = subtest classification by Wechsler (2003b).
118
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
Table 2 Correlations and reliability estimates for Wechsler Intelligence Scale for Children—Fourth Edition and Wechsler Individual Achievement Test—Second Edition subtests.
Information Vocabulary Similarities Word Reasoning Comprehension Matrix Reasoning Picture Concepts Arithmetic Block Design Picture Completion Digit Span Letter–Number Sequencing Symbol Search Coding Cancellation Math Reasoning Numerical Operations
IN
VC
SI
WR
CO
MR
PS
AR
BD
PC
DS
LN
SS
CD
CA
Mre
NO
.86 .74 .72 .64 .62 .51 .37 .67 .47 .51 .42 .50 .42 .31 .10 .65 .60
.86 .74 .66 .70 .46 .36 .63 .42 .49 .44 .51 .39 .30 .09 .63 .58
.86 .63 .63 .48 .38 .56 .49 .51 .43 .50 .40 .24 .10 .62 .55
.80 .59 .37 .38 .49 .38 .46 .36 .39 .32 .21 .08* .52 .46
.81 .34 .33 .51 .31 .43 .41 .43 .34 .28 .05* .51 .47
.89 .38 .48 .51 .40 .39 .42 .40 .29 .11 .57 .52
.82 .43 .42 .35 .31 .37 .28 .24 .20 .41 .38
.88 .52 .41 .47 .53 .42 .34 .11* .73 .64
.86 .48 .41 .42 .47 .29 .17 .56 .48
.84 .30 .35 .37 .24 .11 .46 .40
.87 .51 .36 .31 .18 .49 .42
.90 .43 .32 .10 .58 .50
.79 .48 .28 .48 .45
.85 .36 .34 .38
.79 .09 .08*
.92 .77
.91
Note. IN = Information; VC = Vocabulary; SI = Similarities; WR = Word Reasoning; CO = Comprehension; MR = Matrix Reasoning; PS = Picture Concepts; AR = Arithmetic; BD = Block Design; PC = Picture Completion; DS = Digit Span; LN = Letter–Number Sequencing; SS = Symbol Search; CD = Coding; CA = Cancellation; Mre = Math Reasoning; NO = Numerical Operations. Reliability estimates are in the principal diagonal. WISC-IV reliability estimates taken from Table 2 of Williams et al. (2003a, 2003b) WIAT-II internal consistency estimates taken from Table 6.1 of The Psychological Corporation (2002). Unless noted by a *, all correlations are statistically significant using α = .05.
Keith et al. (2006) outlined a five-factor model consistent with CHC theory (model B, see Fig. 1). In this model, scores from the Similarities, Vocabulary, Comprehension, Information, Word Reasoning, and Picture Completion subtest were used as indicators of the Comprehension-Knowledge factor. Scores from the Block Design, Matrix Reasoning, Picture Completion, and Symbol Search subtests were used as indicators of the Visual Processing factor, whereas scores from the Picture Concepts, Matrix Reasoning, and Arithmetic subtests were used as indicators of the Fluid Reasoning factor. Scores from the Digit Span and Letter–Number Sequence subtests were used as indicators of the Short-Term Memory factor, and scores from the Coding, Symbol Search, and Cancellation subtests were used as indicators of Processing Speed factor. Note that Picture Completion, Matrix Reasoning, and Symbol Search were used as indicators of two CHC ability factors. The covariance among the five factors was used to model a higher-order g factor. These analyses served three purposes. First, it would indicate what structure best fits the current sample's data. Second, it would act as a test of previous research findings on the structure of the WISC-IV (Keith et al., 2006; Wechsler, 2003b). After determining which of the two WISC-IV models fit the data better, we included the WIAT-II Math Reasoning and Numerical Operations subtests in models as indicators of a Quantitative Knowledge factor (model C). Additionally, we analyzed a number of additional models where the Arithmetic and Math Reasoning subtests were manipulated as indicators of Quantitative Knowledge to determine the best combinations of Table 3 Fit statistics for latent variable models of the Wechsler Intelligence Scale for Children—Fourth Edition. Model
Description
χ2
df
p
CFI
RMSEA
SRMR
AIC
A Ba B–R
4-Factor CHC CHC removing Arithmetic from analysis
218.59 150.75 125.17
86 82 69
b.001 b.001 b.001
.96 .98 .98
.05 .04 .04
.04 .03 .03
36,402 36,343 35,065
Note. CFI: comparative fit index; RMSEA: Root Mean Square Error of Approximation; SRMR: Standardized Root Mean Square Residual. AIC: Akaike Information Criterion. All MLR scaling factors (δ) were between 1.01 and 1.03. a Models were Gf was constrained.
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
u1
Similarities
u2
Vocabulary
u3
Comprehension
u4
Information
u5
Word Reasoning
.84 .88
Gc
.76
u6
Matrix Reasoning
u7
Picture Concepts
u8
Block Design
u9
Picture Completion
u10
Digit Span
u11
L-N Sequencing
u12
Arithmetic
u13
Coding
u14
Symbol Search
119
fu1
.85 .83
.36
.75
Gf
.40 .55
1.0
fu2
.76
fu3
.30 .82 .34
Gv
.80
“g”
.41 .90 .76
.68
Gsm .52
fu4 .81 .43
Gs
fu5
.45
u15
Cancellation
Fig. 1. The Cattell–Horn–Carroll model of the Wechsler Intelligence Scale for Children—Fourth Edition. Path values are standardized coefficients. Gc = Comprehension-Knowledge; Gf = Fluid Reasoning; Gv = Visual Processing; Gsm = Short-Term Memory; Gs = Processing Speed; g = General intelligence.
indicators when constructing the Quantitative Knowledge factor. Specifically, we tested the following models: (a) Arithmetic as a third indicator of the Quantitative Knowledge factor, alongside Math Reasoning and Numerical Operations (model D); (b) Arithmetic as an indicator of both Fluid Reasoning and Quantitative Knowledge (model E); (c) Math Reasoning as an indicator for Fluid Reasoning and Arithmetic used as an indicator for Quantitative Knowledge (model F); (d) Arithmetic as an indicator of Quantitative Knowledge and Math Reasoning as an indicator of both Quantitative Knowledge and Fluid Reasoning (model G); and (e) Arithmetic and Math Reasoning as indicators of both Quantitative Knowledge and Fluid Reasoning (model H). The results from these analyses were then used to inform what subtests should be used as indicators of the Quantitative Knowledge factor used in subsequent hypothesis testing. After determining how to model the Quantitative Knowledge factor, we tested the effects of cognitive ability factors and the Quantitative Knowledge factor. To determine these effects, we created four models. Model I used the g factor as the only predictor of Quantitative Knowledge factor (see Fig. 2). Models J and K both used only the five Stratum II factors, Comprehension-Knowledge, Visual Processing, Short-Term Memory, Processing Speed, and Fluid Reasoning, as predictors of the Quantitative Knowledge factor. Unlike model J (see Fig. 3) where a g factor was specified using the Stratum II factors as indicators, model K did not include a g factor but allowed the Stratum II factors to correlate (akin to a multiple regression model). We did not expect models J and K two be drastically different when predicting the Quantitative Knowledge factor, but because model J is more parsimonious than model K, it important to assess differences in how the models fit the data. The last model, model L, used the g factor and the five Stratum II factors as predictors of the Quantitative Knowledge factor (see Fig. 4). 2.3.2. Testing model fit We used multiple indexes to judge how the models fit the sample data. As Barrett (2007) recommended, we inspected each model's χ 2 value and its associated probability (p) value, as models with
120
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
u1
.84
Similarities
u2
Vocabulary
u3
Comprehension
u4
Information
u5
Word Reasoning
.88
Gc
.76
u6
Matrix Reasoning
u7
Picture Concepts
u8
Block Design
u9
Picture Completion
u10
Digit Span
u11
L-N Sequencing
fu1
.85 .84
.35
.75
Gf
.46
.96
.55
fu2 fu3
.26 .81
Gv
.35
.81
“g”
.43 .88 .67
Gsm
.76
.91
.51
fu4 u13
Coding
.82 .41
u14
Gs
Symbol Search
fu5
.44
u15
Cancellation
fu6 u16
Math Reasoning
u17
Num. Operations
.93 .83
Gq
Fig. 2. Model I. g as the single Predictor of Quantitative Knowledge. Path coefficients are standardized values. Gc = ComprehensionKnowledge; Gf = Fluid Reasoning; Gv = Visual Processing; Gsm = Short-Term Memory; Gs = Processing Speed; g = General intelligence; Gq = Quantitative Knowledge.
relatively high p values (i.e., N.10) tend to indicate that the model fit the data relatively well. However, as χ 2 distributions differ depending on degrees of freedom, ceteris paribus, the p value will become progressively smaller as sample size increases (Hu & Bentler, 1995). Consequently, we employed more robust measures to assess model fit: Root Mean Square Error of Approximation (RMSEA), the Comparative Fit Index (CFI), the Standardized Root Mean Square Residual (SMSR), and Akaike's Information Criterion (AIC). These indexes were chosen because it is typically better to rely on multiple measures of fit rather than a single one (Hu & Bentler, 1998, 1999), they include both absolute and relative fit indexes, they tend to perform well in evaluating different models, and some (e.g., the AIC) penalize models for complexity (Marsh, Hau, & Grayson, 2005). Whereas having absolute cutoff criteria for any fit index is tenuous, we used the following as general guidelines to identify models fitting the data relatively well: (a) RMSEA values less than .075 (halfway between .050 and .100; Chen, Curran, Bollen, Kirby, & Paxton, 2008), (b) CFI values greater than .960 (Yu, 2002), and (c) SRMR values less than .080 (Hu & Bentler, 1999; Sivo, Xitao, Witta, & Willse, 2006). For the AIC, there are no cutoff criteria to use; instead, smaller values indicate better fitting models, after accounting for model complexity. Some of the models we compare are nested within each other. Although it is commonplace to test nested models against each other using a chi-square difference test, we eschewed such comparisons as our research questions required that we are compare both nested and non-nested models. If we were to compare chi-square values in a nesting approach, we would be unable to compare all such models using this approach and, thus, we would not be using the same criteria for all model comparisons. 3. Results 3.1. Data inspection When using latent-variable models, or at least estimating their parameters by maximum likelihood (ML), a key assumption is that the data come from a multivariate normal distribution (Kline, 2004).
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
u1
.84
Similarities
u2
Vocabulary
u3
Comprehension
u4
Information
u5
Word Reasoning
u6
Matrix Reasoning
.88
Gc
.76
u7
Picture Concepts
u8
Block Design
u9
Picture Completion
u10
Digit Span
u11
L-N Sequencing
fu1
.85 .81
.36
.75
Gf
.45
.98
.56
fu2 fu3
.27 .81
Gv
.34
.81
.90 .67
Gsm
.76
Coding Symbol Search
u15
Cancellation
.40
.53
.18 .10
.82 .41
u14
“g”
.42
fu4 u13
121
.27
Gs
fu5
.44
.01
u16
Math Reasoning
u17
Num. Operations
fu6 .93
.83
Gq
Fig. 3. Model J. (Stratum II factors as the only predictors of Quantitative Knowledge, with the g factor explaining the Stratum II covariance). Path coefficients are standardized values. Gc = Comprehension-Knowledge; Gf = Fluid Reasoning; Gv = Visual Processing; Gsm = ShortTerm Memory; Gs = Processing Speed; g = General intelligence Gq = Quantitative Knowledge.
Using the methods outlined by Decarlo (1997), as well as his accompanying SPSS macro, the data were investigated for univariate and multivariate normality. All variables exhibited univariate normality using D'agostino and Pearson's (1973) normality test except the WISC-IV subtests Letter–Number Sequencing, Symbol Search, and Picture Concepts. For all variables, there were no nonsensical values or overly-influential data points. Rather, the distribution was bell-shaped, but there were heavier tails on the lower end of the distribution than would be expected if the variable actually followed a normal distribution. Even though ML estimators are robust against small departures from normality, due to the fact some data were missing, we chose to use a more robust ML estimator (Savalei, 2008). Specifically, we used Mplus' MLR estimator, which is designed for non-normal data (Asparouhov & Muthén, 2005; Muthén & Muthén, 2008); it allowed us to continue to use the variables from the three WISC-IV subtests that departed from normality in our models. The resulting χ2 is equivalent to Yuan and Bentler's (2000) T2* statistic, but it can be re-scaled back to a normal χ2 value via the method developed by Satorra and Bentler (2001).
3.2. Analysis one: what factor model best represents the subtests of the WISC-IV? Fit statistics for the traditional 4-factor (model A) and CHC models (model B) are listed in Table 3. For the CHC model, the residual variance associated with the Fluid Reasoning factor had to be constrained to be greater than zero for the model to converge properly, a constraint employed with other examinations of the WISC-IV (e.g., Keith et al., 2006). Although both models appear to fit the data relatively well, the higher CFI and lower RMSEA, SRMR, and AIC values indicate the CHC model has a slightly better fit, although it did require the supplemental parameter constraint for the estimates to converge. Consequently, we used the CHC model of the WISC-IV for the subsequent analyses. The CHC model is displayed in Fig. 1.
122
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
u1
.84
Similarities
u2
Vocabulary
u3
Comprehension
u4
Information
u5
Word Reasoning
.88
Gc
.76
u6
Matrix Reasoning
u7
Picture Concepts
u8
Block Design
u9
Picture Completion
u10
Digit Span
u11
L-N Sequencing
fu1
.85 .81
.36
.75
Gf
.45
.98
.56
fu2 fu3
.27 .81
Gv
.34
.81
.90
Gsm
.76
Coding
-1.0
Symbol Search
u15
Cancellation
.53
.04 -.20
.82 .41
u14
2.00
.67
fu4 u13
“g”
.42
.12
Gs
fu5
.44
-.03
u16
Math Reasoning
u17
Num. Operations
fu6 .93
.83
Gq
Fig. 4. Model L. g + Stratum II factors predicting Quantitative Knowledge. Path coefficients are standardized values. Gc = ComprehensionKnowledge; Gf = Fluid Reasoning; Gv = Visual Processing; Gsm = Short-Term Memory; Gs = Processing Speed; g = General intelligence; Gq = Quantitative Knowledge.
3.3. Analysis two: when the WISC-IV and WIAT-II math achievement tests are factor analyzed together, the model that fits the data the best will have the arithmetic and math reasoning subtests cross-loading on both Fluid Reasoning and Quantitative Knowledge factors Models C through H used both the WISC-IV subtests and the WIAT-II math subtests as factor indicators. Models C and D were used to investigate how changing the factor for which the WISC-IV Arithmetic subtest served as an indicator affected model fit. While keeping Mathematics Reasoning as an indicator of the Quantitative Knowledge factor, model C included Arithmetic as an indicator of the Fluid Reasoning factor, whereas model D included Arithmetic as an indicator of the Quantitative Knowledge factor. For every functional purpose, both models fit the data equivalently as there are minimal differences in their fit statistic values. Whereas neither of the models' χ2 values are small (i.e., all p values are less than .001), their CFI, TLI, SRMR, and RMSEA values were within the limits we specified in the Testing Model Fit section (see Table 4). These results indicate that that modeling Arithmetic either as an indicator of the Fluid Reasoning factor or as an indicator of the Quantitative Knowledge factor does not substantially influence how well the models fit this sample data. Given the virtual equivalence of how models C and D fit the data, we next tested various placements of the WISC-IV Arithmetic subtest and the WIAT-II Math Reasoning subtest using Models E, F, G, and H. While keeping Mathematics Reasoning as an indicator of the Quantitative Knowledge factor, model E specified Arithmetic to be an indicator of both the Quantitative Knowledge factor and the Fluid Reasoning factor. Model F specified Arithmetic as an indicator of only the Quantitative Knowledge factor and Math Reasoning as an indicator of the Fluid Reasoning factor. Model G specified Arithmetic as an indicator of the Quantitative Knowledge factor and Math Reasoning as an indicator of the Fluid Reasoning (as model F) and also specified Math Reasoning as an indicator of the Quantitative Knowledge factor. Finally, model H specified both Math Reasoning and Arithmetic to be indicators of the Quantitative Knowledge and Fluid Reasoning
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
123
Table 4 Fit statistics for latent variable models of the Wechsler Intelligence Scale for Children—Fourth Edition and Wechsler Individual Achievement Test—Second Edition math subtests. Model description a
C D E Fb G H
CHC with AR CHC with AR CHC with AR CHC with AR CHC with AR CHC with AR (1.43)
on on on on on on
Gf (.79) and MR on Gq (.93) Gq (.79) and MR on Gq (.92) Gf (.39) and Gq (.41) and MR on Gq (.93) Gq (.79) and MR on Gf (.90) Gq (.77) and MR on Gf (−.53) and Gq (1.44) Gf (.31) and Gq (.49) and MR on Gf (−.51) and Gq
χ2
df
p
CFI RMSEA SRMR AIC
189.24 189.62 185.33 220.79 187.22 184.30
110 110 109 110 109 108
b.001 b.001 b.001 b.001 b.001 b.001
.98 .98 .98 .98 .98 .98
.04 .04 .04 .04 .04 .04
.03 .03 .03 .04 .03 .03
40,952 40,951 40,947 40,986 40,950 40,948
Note. Gf = Fluid Reasoning, Gq = Quantitative Knowledge a: Models where Gf was constrained. b: Models were Gf and Gq were constrained. MR = Math Reasoning; AR = Arithmetic; CFI = comparative fit index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Square Residual. AIC = Akaike Information Criterion. All MLR scaling factors (δ) were between 1.01 and 1.04. Parenthetical numbers are standardized path coefficients for Arithmetic and Math Reasoning subtests. *p b .05 **p b .001;
factors. The fit statistics for these models and the standardized path coefficients for the Arithmetic and Math Reasoning subtests on their respective Stratum II factors are included in Table 4. Models E–H all fit the data relatively well, with minimal-to-no differences among their fit statistic values. Nonetheless, there were ways to compare how the models fit the data. First, when Math Reasoning was an indicator of both the Fluid Reasoning factor and the Quantitative Knowledge factor (models G and H), its loadings on both factors had standard errors at least seven-times greater than all the other subtests, indicating a problem with both models. Second, when Arithmetic was an indicator of both the Quantitative Knowledge factor and the Math Reasoning was an indicator of the Fluid Reasoning factor (model F), the residual variance for both the Fluid Reasoning and Quantitative Knowledge factors had to be constrained to be non-negative, and their final estimates were approximately zero. Although this problem is not uncommon with latent variable models of cognitive ability (e.g., Gustafsson, 1988; Keith et al., 2006), it is not an ideal way for a model to fit the data. Third, the models in which Arithmetic was an indicator of the Quantitative Knowledge factor (models D and E) converged without the need to add any additional model constraints on the Fluid Reasoning factor, and the model in which Arithmetic was an indicator of only the Quantitative Knowledge factor (model D) produced standard error estimate for Arithmetic was no larger than any of the other subtests' standard error estimates. These results indicate that models D and E fit the data better than other the others we evaluated. Our finding that models specifying the Arithmetic subtest as an indicator of Quantitative Knowledge or Fluid Reasoning factors fit the data as well as models that specify it as an indicator of Short-Term Memory (or Working Memory, where it is located on the traditional WISC-IV factor model, model A) is not unique to our analyses. Other research has also highlighted the ambiguity around the ability (or abilities) Arithmetic may measure (Keith et al., 2006; Keith & Witta, 1997; Phelps et al., 2005)2. These results likely occur because examinee performance on the Arithmetic subtest may be as much a function of their Quantitative Knowledge as it is of their Fluid Reasoning or Short-Term Memory abilities. Thus, trying to predict Quantitative Knowledge from a Fluid Reasoning factor that also contains variance explained by Quantitative Knowledge would likely inflate the magnitude of the Fluid Reasoning factor's effect on Quantitative Knowledge. As a result, we chose to remove Arithmetic from the subsequent models (for model fit, see model B–R in Table 3. Although the use of Math Reasoning may also present similar problems, we chose to retain it as an indicator of Quantitative Knowledge for two reasons. First, removing the subtest would mean that our latent Quantitative Knowledge factor would be defined solely by the Numerical Operations subtest. Second, and our primary reason for retaining the subtest, it is a required measure of math achievement on the WIAT-II, and practitioners must use it to generate a math ability standard score (whereas Arithmetic is used as a
2 In addition to the models described in the text, we also tested CHC models with Arithmetic as an indicator of the Short-Term Memory factor. The models had the same problems converging as models C and E, and the fit index values were virtually the same as those from models C–H (model fit values available upon request).
124
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
supplementary measure). It has also been used as a Quantitative Knowledge indicator in other research (e.g., Glutting, Watkins, Konold, & Mcdermott, 2006) and would help integrate the results of this investigation with previous findings. 3.4. Analysis three: testing the broad abilities vs. g-only hypotheses To test the difference in how g and the Stratum II factors predicted Quantitative Knowledge, we tested four models. Model I specified the g factor as the only predictor of the Quantitative Knowledge factor (see Fig. 2), whereas both models J and K specified the Comprehension-Knowledge, Visual Processing, ShortTerm Memory, Processing Speed, and Fluid Reasoning factors as predictors of the Quantitative Knowledge factor. The difference between models J and K is that model J specified a g factor from the ComprehensionKnowledge, Visual Processing, Short-Term Memory, Processing Speed, and Fluid Reasoning covariance but model K did not specify a g factor and allowed all the Stratum II factors to correlate (akin to a multiple regression model). Model L used both the g factor and the Stratum II factors as predictors of Quantitative Knowledge (see Fig. 4). Results from these analyses are displayed in Table 5. Although the results show that all the models fit the data relatively well, when g and Fluid Reasoning both predict Quantitative Knowledge (model L; see also Fig. 4), the effect of Fluid Reasoning becomes negative and the standard errors for both path coefficients becomes large (approximately 10 times as large as any other coefficient), indicating some problems with this model. Moreover, when just the Stratum II factors are used as predictors (models J and K), the magnitudes of their standardized path coefficients are relatively low (i.e., b.30). Although the standardized path coefficients for the Fluid Reasoning factor are larger (values of .40 and .82 for models J and K, respectively), the associated standard errors for the path coefficients are large too (values of .44 and .82, respectively), indicating a lack of precision for these parameter estimates. Thus, none of the Stratum II factors are very strong predictors of the Quantitative Knowledge factor. When g was used as the only predictor (model I; see also Fig. 2), there were no problems with the parameter estimation, the magnitude of the standardized path coefficient was large (.91), and it explained more variance in the Quantitative Knowledge factor (R 2 = .83) than either model in which it was excluded as a predictor (R 2 = .77 and R 2 = .79 for models J and K, respectively). 4. Discussion The purpose of this project was to examine effects of Cattell–Horn–Carroll (CHC; McGrew, 2005; Newton & McGrew, 2010) cognitive abilities on Quantitative Knowledge ability by using the respondents from the linking sample of the Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV; Wechsler, 2003b) and Wechsler Individual Achievement Test, Second Edition (WIAT-II; Wechsler, 2002). Its second purpose was to compare how our results fit into the broader CHC–academic achievement research agenda, which predominantly uses WJ instruments. First, we examined the factor structure of the WISC-IV subtests, including specific hypotheses about whether the Arithmetic subtest should be classified as a measured of Fluid Reasoning (Keith et al., 2006), Short-Term Memory or Working Memory (Wechsler, 2003b), or as a measure of Quantitative Knowledge (Woodcock, 1990). Because the CHC model fit the data better than the traditional fourTable 5 Fit statistics for models predicting Quantitative Knowledge from the Cattell–Horn–Carroll factors. Model
Predictor
χ2
df
p
CFI
RMSEA
SRMR
AIC
R2
I J K L
g Gc Gv Gsm Gs Gf (modeled g; not as predictor) Gc Gv Gsm Gs Gf (Correlated factors) g Gc Gv Gsm Gs Gf
167.81 161.70 149.98 161.70
95 91 86 91
b.001 b.001 b.001 b.001
.98 .98 .99 .98
.04 .04 .04 .04
.03 .03 .03 .03
39,720 39,721 39,719 39,721
.83 .77 .79 .89
Note. Gc = Crystallized Intelligence; Gv = Visual Processing; Gsm = Short-Term Memory; Gs = Processing Speed; Gf = Fluid Intelligence. CFI = comparative fit index; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean Square Residual. AIC = Akaike Information Criterion. All MLR scaling factors were between 1.01 and 1.02.
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
125
factor model, this result gives some evidence that that Arithmetic might not be a good indicator of ShortTerm Memory or Working Memory. Additional analyses varying the location of both Arithmetic and Math Reasoning subtests on Fluid Reasoning and Quantitative Knowledge factors, however, indicated that there was no “best” placement for the Arithmetic subtest, thus adding to the literature showing the ambiguity of the subtest (Keith & Witta, 1997; Phelps et al., 2005). Based on this ambiguity, we removed Arithmetic from subsequent analyses as we thought using it as an indicator of Fluid Reasoning might artificially inflate this factor's effect on Quantitative Knowledge. Second, we developed three hypotheses concerning CHC Stratum II and III factors and Quantitative Knowledge: (a) Fluid Reasoning and Comprehension-Knowledge should “strongly” and positively predict math achievement, and Processing Speed and Short-Term Memory should also positively predict math achievement, but at a lesser magnitude; (b) after accounting for the g-related variance in Quantitative Knowledge, Fluid Reasoning and Comprehension-Knowledge should still contribute to the prediction of Quantitative Knowledge and its contribution should be more than Processing Speed and Short-Term Memory; and (c) g's influence on Quantitative Knowledge is only indirect, or mostly indirect, funneling most of its predictive effect through Fluid Reasoning and Comprehension-Knowledge. The results indicated that we were wrong for most of our hypotheses. No Stratum II factors were significant predictors of Quantitative Knowledge when the CHC factors were allowed to correlate instead of including g in the model. When g was included, but not used as a predictor of Quantitative Knowledge, ComprehensionKnowledge was the only Stratum II factor to demonstrate an effect on Quantitative Knowledge. When g was a predictor of Quantitative Knowledge, it had a strong, direct effect, suggesting, at least within the context of WISC-IV assessment, g represents the strongest predictor of Quantitative Knowledge. 4.1. Comparisons with previous studies Glutting et al. (2006) used SEM with the same sample, constructing models containing a higher-order g factor as well as Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed factors from the 10 core WISC-IV subtests. They concluded that g and, to a much smaller degree, Verbal Comprehension factors are the only important factors when explaining Quantitative Knowledge with the WISC-IV. Using the current data set, we found a CHC-based model fit the WISC-IV subtests better than the traditional four-factor model used by Glutting et al., but we found similar results in that g was the strongest predictor of Quantitative Knowledge. Thus, it does not appear that a CHC-based interpretational structure has a major effect on explaining Quantitative Knowledge. It is interesting to compare the results of this study with the results from other SEM analyses examining effects of Quantitative Knowledge (e.g., Keith, 1999; McGrew et al., 1997; Taub et al., 2008). These studies, most of which used the Woodcock–Johnson instruments, demonstrated that while g typically accounted for the largest portion of variance in Quantitative Knowledge, the Processing Speed, Fluid Reasoning, and Short-Term Memory factors still explained additional variance. Taub et al. (2008) also stressed that g demonstrated an indirect effect on Quantitative Knowledge. In comparison, the present analyses with the WISC-IV indicated that g demonstrated the largest effect on Quantitative Knowledge, its effect was direct, and Stratum II factors did not explain substantial additional variance in Quantitative Knowledge beyond which g explained. In addition to the difference in the instruments used between this study and other studies, it is important to note slight differences in methodologies used between other studies and the current one. For example, McGrew et al. (1997) used a forward entry method, and Taub et al. used a backward deletion method of SEM analysis. Keith (1999) used a different approach, testing a set of a priori specified competing models, akin to the current study, and relying on a convergence of fit statistics and an examination of factors loadings and path coefficients to test specific hypotheses. However, Keith's study was more interested in examining group differences in effects, unlike the current study, which was designed to test a variety of hypotheses regarding how cognitive ability was related Quantitative knowledge. Collectively, the results of this study and the aforementioned investigations highlight significant differences in how instruments measuring cognitive ability represent CHC-defined factors, a warning put forth by Floyd et al. (2005). According to them, differences between measures of the same ability are chiefly due to interactions between examinees' ability and (a) the range of ability a test may capture (e.g. floor and ceiling effects), (b) temporal aspects of the test situation, and (c) the specific task demands of a subtest.
126
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
Because the WJ III can be used to assess a wider range of ages than the WISC-IV, its subtests may have lower floors and higher ceilings, which may be one reason for the different results. However, differences in the specific tasks on the two batteries are more likely reasons for the difference. Floyd et al. suggested that the a lack of similarity across tests may be due to narrow ability characteristics of subtests that assess unevenly developed skills within individuals, the influence of construct-irrelevant abilities, or a combination of these influences. When comparing the results of this study with the results of Taub et al. (2008), it is important to remember that the WJ III was designed to operationalize CHC theory. Its subtests were created and combined to maximize construct-relevant variance associated with the Stratum II CHC factor the instrument measures. The WISC-IV, on the other hand, was designed to measure the four Wechsler factors (Kaufman et al., 2006). Thus, although a CHC model fits the WISC-IV data well, the subtests were not designed to measure the CHC abilities specifically. In light of the differences between how WISC-IV CHC factors and WJ-III CHC factors explain Quantitative Knowledge, more research in CHC theory and its application to practice via Cross-Battery Assessment (Flanagan et al., 2007) is needed. Test developers and clinicians alike require a better understanding of how CHC Stratum I abilities may influence the measurement of Stratum II abilities to more precisely assess similarities and differences in factors from separate test batteries that presumably measure similar abilities. Although costly in time and finances to collect, factor analyses of multiple test batteries with both typical developing and disability samples may be useful, especially analyses with a focus on Stratum I abilities.
4.2. Implications for practice The results from this study indicate that g is the single best predictor of Quantitative Knowledge, at least in the context of practice with the WISC-IV and WIAT-II. This finding, coupled with other similar investigations (e.g., Glutting et al., 2006; Glutting, Youngstrom, Ward, Ward, & Hale, 1997) not only reaffirms the efficacy of g in predicting educational outcomes (Deary, 2000; Jensen, 1998), but also brings into question some of the current practices in Cross-Battery Assessment—at least with respect to math outcomes in the context of the WISC-IV. In two of the most recent books on Cross-Battery Assessment, the authors stress using measures of Fluid Reasoning, Comprehension-Knowledge, Processing Speed, and Short-Term Memory in lieu of using the Full Scale IQ (Flanagan et al., 2006; Flanagan et al., 2007). Whereas the data from other studies (for a review, see Flanagan et al., 2006, pp. 41–42) do show that these CHC factors tend to be related to math outcomes, in light of the results from the current analysis, it seems unwise not to focus interpretation on the most cogent single predictor available, especially when interpreting the WISC-IV in an applied setting.
4.3. Limitations The major limitations of this study are twofold. First, although the sample was nationally representative (Wechsler, 2003b), it is relatively small (n = 550) considering its developmental heterogeneity (age range was 6 to 16). Consequently, splitting the current sample by age would result in subsamples that were likely too small to produce consistent results with the latent-variable methods this study employed (MacCallum, Browne, & Sugawara, 1996). The inability to examine age effects is unfortunate, because there is some research indicating that the effect of CHC cognitive ability factors on math achievement is somewhat moderated by age (Taub et al., 2008). Consequently, future researchers should look to either select a sample that is more homogenous in age or select a larger number of participants so that the sample can be split into more narrow age groups without losing statistical power. Second, the variables used in this study were from the Wechsler family of psychological instruments. Whereas, in and of itself, use of only Wechsler scales is neither good nor bad, it does limit the generalizability of the findings. Other instruments have demonstrated stronger relations between the CHC factors and math achievement (Taub et al., 2008). Consequently, future research studying the effect of WISC-IV operationalized CHC abilities on WIAT-II math scores should probably include subtests from other instruments that measure Fluid Reasoning, Comprehension-Knowledge, and Processing Speed in order to examine if the instrument from which the CHC factors were derived influences their predictive ability (Floyd et al., 2005).
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
127
Authors' notes Standardization data from the Wechsler Intelligence Scale for Children — Fourth Edition (WISC-IV). Copyright 2003 by NCS Pearson, Inc. Standardization data from the Wechsler Individual Achievement Test — Second Edition (WIAT-II). Copyright 2001 by NCS Pearson, Inc. Used with permission. All rights reserved. “Wechsler Intelligence Scale for Children,” “WISC,” “Wechsler Individual Achievement Test,” and “WIAT” are trademarks, in the US and other countries, of Pearson Education, Inc. or its affiliate(s). The authors wish to disclose that they have no financial interest vested in the presented research. The authors would like to thank Gary Canivez, Joseph Glutting, Marley Watkins, and Eric Youngstrom for their thoughts, insights, and comments on the issues discussed and analyses used in this paper. References Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage. Asparouhov, T., & Muthén, B. O. (2005, November). Multivariate statistical modeling with survey data. Paper presented at the meeting of the Federal Committee on Statistical Methodology Research Conference, Arlington, VA. Barrett, P. (2007). Structural equation modelling: Adjudging model fit. Personality and Individual Differences, 42, 815–824. Bollen, K. A. (1989). Structural equations with latent variables. Oxford, England: John Wiley & Sons. Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York: Cambridge University Press. Chen, F., Curran, P. J., Bollen, K. A., Kirby, J., & Paxton, P. (2008). An empirical evaluation of the use of fixed cutoff points in RMSEA test statistic in structural equation models. Sociological Methods & Research, 36, 462–494. D'agostino, R., & Pearson, E. S. (1973). Tests for departure from normality. Empirical results for the distributions of b2 and √b1. Biometrika, 60, 613–622. Deary, I. J. (2000). Looking down on human intelligence: From psychometrics to the brain. Oxford, UK: Oxford University Press. Decarlo, L. T. (1997). On the meaning and use of kurtosis. Psychological Methods, 2, 292–307. Elliot, C. D. (2007). Differential Ability Scales (2nd ed.). San Antonio, TX: The Psychological Corporation. Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 8, 430–457. Flanagan, D. P. (2000). Wechsler-based CHC cross-battery assessment and reading achievement: Strengthening the validity of interpretations drawn from Wechsler test scores. School Psychology Quarterly, 15, 295–329. Flanagan, D. P., McGrew, K. S., & Ortiz, S. O. (2000). The Weschler Intelligence Scales and Gf–Gc theory: A contemporary approach to interpretation. Boston: Allyn & Bacon. Flanagan, D. P., Ortiz, S. O., & Alfonso, V. C. (2007). Essentials of Cross-Battery Assessment (2nd ed.). Hoboken, NJ: Wiley. Flanagan, D. P., Ortiz, S. O., Alfonso, V., & Mascolo, J. (2006). The achievement test desk reference: A guide to learning disability identification (2nd ed.). Hoboken, NJ: John Wiley. Floyd, R. G., Bergeron, R., Mccormack, A. C., Anderson, J. L., & Hargrove-Owens, G. L. (2005). Are Cattell–Horn–Carroll (CHC) broad ability composite scores exchangeable across batteries? School Psychology Review, 34, 329–356. Floyd, R. G., Evans, J. J., & McGrew, K. S. (2003). Relations between measures of Cattell–Horn–Carroll (CHC) cognitive abilities and mathematics achievement across the school-age years. Psychology in the Schools, 40, 155–171. Floyd, R. G., Shaver, R. B., & McGrew, K. S. (2003). Interpretation of the Woodcock–Johnson III tests of cognitive abilities: Acting on evidence. In F. A. Schrank, & D. P. Flanagan (Eds.), WJ III clinical use and interpretation: Scientist–practitioner perspectives (pp. 1–46). New York: Academic Press. Friedman, L. (1995). The space factor in mathematics: Gender differences. Review of Educational Research, 65(1), 22–50, doi: 10.3102/00346543065001022. Geary, D. C. (1994). Children's mathematical development: Research and practical applications. Washington, D.C.: American Psychological Association. Glutting, J. J., Watkins, M. W., Konold, T. R., & Mcdermott, P. A. (2006). Distinction without a difference: The utility of observed versus latent factors from the WISC-IV in estimating reading and math achievement on the WIAT-II. Journal of Special Education, 40, 103–114. Glutting, J. J., Youngstrom, E. A., Ward, T., Ward, S., & Hale, R. L. (1997). Incremental efficacy of WISC-III factor scores in predicting achievement: What do they tell us? Psychological Assessment, 9(3), 295–301, doi:10.1037/1040-3590.12.4.402. Gustafsson, J. E. (1988). Hierarchical models of individual differences in cognitive abilities. In R. J. Sternberg (Ed.), Psychology of human intelligence, Vol. 4. (pp. 35–71)Hillsdale, NJ: Erlbaum. Hale, J. B., Fiorello, C. A., Kavanagh, J. A., Hoeppner, J. -A. B., & Gaither, R. A. (2001). WISC-III predictors of academic achievement for children with learning disabilities: Are global and factor scores comparable? School Psychology Quarterly, 16, 31–55. Horn, J. L., & Cattell, R. B. (1966). Refinement and test of the theory of fluid and crystallized intelligence. Journal of Educational Psychology, 57(2), 253–270, doi:10.1037/h0023816. Hu, L. -T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 76–99). Thousand Oaks, CA: Sage Publications Inc. Hu, L. -T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453. Hu, L. -T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CN: Praeger. Kaufman, A. S., Flanagan, D. P., Alfonso, V. C., & Mascolo, J. T. (2006). Test review: Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV). Journal of Psychoeducational Assessment, 24, 278–295.
128
J.R. Parkin, A.A. Beaujean / Journal of School Psychology 50 (2012) 113–128
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children (2nd ed.). Circle Pines, MN: American Guidance Service. Keith, T. Z. (1999). Effects of general and specific abilities on student achievement: Similarities and differences across ethnic groups. School Psychology Quarterly, 14, 239–262. Keith, T. Z. (2006). Multiple regression and beyond. Boston: Pearson. Keith, T. Z., Fine, J. G., Taub, G. E., Reynolds, M. R., & Kranzler, J. H. (2006). Higher order, multisample, confirmatory factor analysis with the Wechsler Intelligence Scale for Children — Fourth Edition: What does it measure? School Psychology Review, 35, 108–127. Keith, T. Z., & Reynolds, M. R. (2010). Cattell–Horn–Carroll abilities and cognitive tests: What we've learned from 20 years of research. Psychology in the Schools, 47, 635–650. Keith, T. Z., & Witta, E. L. (1997). Hierarchical and cross-age confirmatory factor analysis of the WISC-III: What does it measure? School Psychology Quarterly, 12, 89–107. Kline, R. B. (2004). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley. MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. Marsh, H. W., Hau, K. -T., & Grayson, D. (2005). Goodness of fit in structural equation models. In A. Maydeu-Olivares, & J. J. McArdle (Eds.), Contemporary psychometrics: A festschrift for Roderick P. McDonald (pp. 275–340). Mahwah, NJ: Erlbaum. McGrew, K. S. (1997). Analysis of the major intelligence batteries according to a proposed comprehensive Gf–Gc framework. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests and issues (pp. 151–179). New York: Guilford Press. McGrew, K. S. (2005). The Cattell–Horn–Carroll theory of cognitive abilities: Past, present and future. In D. P. Flanagan, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests and issues (pp. 136–181). New York: Guilford Press. McGrew, K. S., & Hessler, G. L. (1995). The relationship between the WJ-R Gf–Gc cognitive clusters and mathematics achievement across the life-span. Journal of Psychoeducational Assessment, 13, 21–38. McGrew, K. S., Keith, T. Z., Flanagan, D. P., & Vanderwood, M. (1997). Beyond g: The impact of Gf–Gc specific cognitive abilities research on the future use and interpretation of intelligence test batteries in the schools. School Psychology Review, 26, 189–210. McGrew, K. S., & Wendling, B. J. (2010). Cattell–Horn–Carroll cognitive–achievement relations: What we have learned from the past 20 years of research. Psychology in the Schools, 47, 651–675. Muthén, L. K., & Muthén, B. O. (2008). Computer software. Los Angeles: Muthén and Muthén. Newton, J. H., & McGrew, K. S. (2010). Introduction to the special issue: Current research in Cattell–Horn–Carroll-Based assessment. Psychology in the Schools, 47, 621–634. Phelps, L., McGrew, K. S., Knopik, S. N., & Ford, L. (2005). The general (g), broad, and narrow CHC stratum characteristics of the WJ III and WISC-III tests: A confirmatory cross-battery investigation. School Psychology Quarterly, 20, 66–88. Proctor, B. E., Floyd, R. G., & Shaver, R. B. (2005). Cattell–Horn–Carroll broad cognitive ability profiles of low math achievers. Psychology in the Schools, 42, 1–12. Roid, G. H. (2003). Stanford–Binet Intelligence Scales (5th ed.). Itasca, IL: Riverside Publishing. Satorra, A., & Bentler, P. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514. Savalei, V. (2008). Is the ML chi-square ever robust to nonnormality? A cautionary note with missing data. Structural Equation Modeling: A Multidisciplinary Journal, 15, 1–22. Schneider, W. J. (2008). Playing statistical ouija board with commonality analysis: Good questions, wrong assumptions. Applied Neuropsychology, 15, 44–53. Sivo, S. A., Xitao, F., Witta, E. L., & Willse, J. T. (2006). The search for "optimal" cutoff properties: Fit index criteria in structural equation modeling. Journal of Experimental Education, 74, 267–288. Taub, G. E., Floyd, R. G., Keith, T. Z., & McGrew, K. S. (2008). Effects of general and broad cognitive abilities on mathematics achievement. School Psychology Quarterly, 25, 187–198. The Psychological Corporation (2002). Wechsler individual achievement test—Second edition examiner's manual. San Antonio, TX: Author. Wechsler, D. (1974). Wechsler Intelligence Scale for Children (Revised ed.). New York: The Psychological Corporation. Wechsler, D. (1991). Wechsler Intelligence Scale for Children (3rd ed.). San Antonio, TX: The Psychological Corporation. Wechsler, D. (2002). Wechsler Individual Achievement Test (2nd ed.). San Antonio, TX: The Psychological Corporation. Wechsler, D. (2003). Wechsler Intelligence Scale for Children (4th ed.). San Antonio, TX: The Psychological Corporation. Wechsler, D. (2003). WISC-IV technical and interpretive manual. San Antonio, TX: The Psychological Corporation. Williams, P. E., Weiss, L. G., & Rolfhus, E. L. (2003). WISC-IV technical report #1: Theoretical model and test blueprint. San Antonio, TX: The Psychological Corporation. Williams, P. E., Weiss, L. G., & Rolfhus, E. L. (2003). WISC-IV technical report #2: Psychometric properties. San Antonio, TX: The Psychological Corporation. Woodcock, R. W. (1990). Theoretical foundations of the WJ-R measures of cognitive ability. Journal of Psychoeducational Assessment, 8, 231–258. Woodcock, R. W., & Johnson, M. B. (1989). Woodcock–Johnson Psycho-Educational Battery — Revised. Chicago: Riverside Publications. Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson tests of cognitive abilities (3rd Ed.). Itasca, IL: Riverside. Yu, C. Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. (Doctoral dissertation). Available from ProQuest Dissertations & Theses database. (UMI No. 3066425). Yuan, K. -H., & Bentler, P. M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. In M. E. Sobel, & M. P. Becker (Eds.), Sociological methodology 2000 (pp. 165–200). Washington, DC: American Sociological Association.