Intelligence 32 (2004) 607 – 619
A reappraisal of the relationship between span memory and intelligence via bbest evidence synthesisQ Margaret E. Beiera,*, Phillip L. Ackermanb a
Psychology Department, MS-25, Rice University, P.O. Box 1892, Houston, TX 77251-1892, United States b Georgia Institute of Technology, United States Received 22 March 2004; received in revised form 11 July 2004; accepted 28 July 2004 Available online 11 September 2004
Abstract This paper examines the relationship between span memory [e.g., immediate memory, short-term memory (STM), simple span] and general ability ( g) though a reanalysis of two data sets [Christal, R. E. (1959). Factor analytic study of visual memory. Psychological Monographs: General and Applied, 72 (13, Whole No. 466); Kelley, H.P. (1964). Memory abilities: A factor analysis. Psychometric Monographs, No. 11]. Because of their large sample sizes and the multiple measures used to identify each construct, the Christal and Kelley studies were examined within a bbest evidence synthesisQ framework. Modern structural equation modeling (SEM) techniques were used to examine the relationship between immediate memory and g. Results indicated that in both studies, the relationship between immediate memory and g was quite substantial (.71 and .83), and that this relationship was essentially reduced by half when the common content variance of the tests was accounted for (e.g., verbal, spatial, numerical). Results are discussed within the context of recent research examining the relationship between working memory (WM) and g. D 2004 Elsevier Inc. All rights reserved. Keywords: Span memory; Best evidence synthesis; Structural equation modeling
1. Introduction Psychologists have studied span, short-term, or immediate memory (as opposed to long-term memory) since the beginning of modern experimental psychology (Jacobs, 1887). The finding that older * Corresponding author. Tel.: +1 713 348 3920. E-mail address:
[email protected] (M.E. Beier). 0160-2896/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.intell.2004.07.005
608
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
children, on average, had higher memory spans than younger children was an important feature for their inclusion in Binet’s intelligence scales which were designed to identify a child’s bmental ageQ (Binet & Simon, 1908/1961 as cited in Jenkins & Patterson, 1961). Span measures have continued to be useful in the measurement of intelligence—evidenced by their inclusion in most omnibus tests of intelligence, such as the Stanford–Binet and Wechsler’s Adult Intelligence Scales (WAIS). In fact, intelligence tests such as these have used measures of immediate memory without much change since their inception through today (Anastasi & Urbina, 1997; Psychological Corporation, 1997; Terman & Merrill, 1937, 1960; Thorndike, Hagen, & Sattler, 1986). Despite the use of immediate memory measures in intelligence assessment, there is a long-standing debate among intelligence theorists about the utility of such measures for assessing cognitive ability. In an early review of the relationship between memory and intelligence, Blankenship (1938) summarized various viewpoints on the utility of memory tests in measuring intelligence. Blankenship reported that many intelligence theorists regarded memory as a separate cognitive ability. However, he also noted that there was evidence that memory performance was affected by the content of the material presented (verbal, numerical, and spatial), suggesting that memory may not actually be completely separable from other group factors. Blankenship also examined the reported relationships between memory and intelligence across various studies and concluded that the evidence pointed to a definite relationship between memory and intelligence, but bat the present time, results are so varying in nature that the true degree of correlation between the two is impossible to predictQ (Blankenship, 1938, p. 17). In his survey and reanalysis of studies examining the structure of cognitive abilities, Carroll (1993) identified data sets from 117 separate samples in the immediate memory domain. Along with higherorder memory factors, five first-order factors were identified and labeled as follows: Memory Span, Associative Memory, Free Recall, Meaningful Memory, and Visual Memory. They are further defined below. Memory Span, also known as memory for order (MFO), generally refers to a person’s bability to reproduce, immediately after one presentation, a series of discrete stimuli in their original orderQ (Blankenship, 1938, p. 2). In studies of Memory Span, the nature of the stimuli, the method of presentation, and the type of reproduction required make little difference (Carroll, 1993). Thus, the units studied in tests of Memory Span can be digits, letters, words, shapes, or sounds. Tests of Associative Memory require the participant to study pairs of stimuli and then to recall one member of the pair, given the other member. In tests of Associative Memory, the study period is generally longer than the study period in Memory Span tests—allowing for more long-term memory storage of the pairs. In addition, unlike Memory Span measures, the content of the measure seems to matter—that is, in factor analytic studies, associative memory measures load on separate content factors (i.e., verbal, numerical, and spatial). Measures of Free Recall require participants to recall material previously studied in any order. The material studied in these types of measures is generally meaningless and arbitrary. In contrast to tests of memory span, the stimuli in tests of Free Recall are generally beyond the participant’s memory capacity, and memory for order is not required. Meaningful Memory measures also include a study and an immediate recall or test phase. However, in contrast to Free Recall measures, the materials presented represent meaningful relations between paired stimuli or a bmeaningful story or connected discourseQ (Carroll, 1993, p. 277). Visual Memory tests are similar to Free Recall and Meaningful Memory tests. The procedure used in these tests involves the study of a display (e.g., nonstandard shapes or a map) and then yes or no
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
609
responses to whether stimuli presented separately appeared in the same position and orientation as in the original display. Generally, visual memory tests involve objects that are not easily recognizable. This is because objects that are easily labeled (e.g., square) could be encoded verbally. The measures described above vary in their administration and in the content of the stimuli. Yet, these measures are all considered tests of immediate memory (i.e., tasks requiring storage and retrieval of information). Carroll (1993) found that the average correlation between the immediate memory factors described above and general ability was r=.38 (correlations ranging from .28 to .54). Mukunda and Hall (1992) provided some additional insight into the relationship between memory and intelligence in their meta-analysis of the relationship between span memory and aptitude/achievement tests. Specifically, Mukunda and Hall examined short-term memory (STM) tests that included the presentation of a list of items (whether digits, letters, objects, sentences, etc.) and that required participants to recall items in order [these memory span measures were termed memory for order (MFO)]. Their meta-analysis was relatively limited in that it included only articles published between 1976 and 1989—51 independent samples from 23 separate studies. They found a modest relationship between measures of aptitude and achievement and MFO, qˆ =.248, pb.0001 when a full set of tests was examined (qˆ =.270, pb.001 with a reduced set of tests that combined multiple effect sizes from the same samples). Tests measuring working memory (WM) ability (i.e., tasks requiring both storage and processing, such as the sentence span task from Daneman & Carpenter, 1980) were distinguished from simple span memory tasks in this analysis. Mukunda and Hall found that the relationship between simple span measures and aptitude was relatively modest (Mean-r=.265) and similar to the relationship between WM tasks and aptitude (Mean-r=.283). The difference between these effect sizes was not statistically significant in this study. More recently, Ackerman, Beier, and Boyle (in press) conducted a meta-analysis examining the relationship between WM and intelligence and immediate memory and intelligence. As opposed to the relatively limited range of studies included in the Mukunda and Hall (1992) meta-analysis, the Ackerman et al. study was based on a literature search that ranged from 1872 through 2002. They classified cognitive ability tests into 12 different categories, including content abilities, such as verbal, numerical, and spatial, and general reasoning abilities, among others. WM measures were also classified by content and whether or not the test included simultaneous processing of different content (e.g., a single item that contains a verbal storage task and a numerical processing task would be classified as bverbal with numericalQ). Ten different classifications for WM measures were used (e.g., verbal, verbal with numerical, verbal with spatial, etc.). The meta-analytically derived correlation (based on 86 independent samples and a total sample of 9778) between all 12 ability categories and all 10 WM categories was .324 (qˆ =.397 when corrected for unreliability). Four categories of span or short-term memory (STM) were included in the meta-analysis based on content (i.e., verbal, numerical, and spatial, or tests with varying content). The meta-analytically derived correlation (based on 49 independent samples and a total sample of 5440) between the four measures of STM and the ability measures was .214 (qˆ =.260 corrected for unreliability). This correlation is significantly lower than the relationship between WM and ability reported by Ackerman et al., but it does point to a definite relationship between simple span memory and intelligence. Over the past 25 years or so, researchers have distinguished WM measures from measures of immediate memory. Some researchers have suggested that WM and intelligence are essentially equivalent constructs (Engle, 2002; Kyllonen & Christal, 1990). This assertion is most likely due to large structural or path coefficients found between measures of g and WM when analyzed using structural
610
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
equation modeling (SEM) techniques. For example, Kyllonen and Christal (1990) found structural coefficients of .80 through .88 between WM and g factors in an SEM framework. Ackerman, Beier, and Boyle (2002) also found a structural coefficient of .70 between WM and g using SEM techniques. However, raw correlations between individual measures of WM and intelligence were closer to r=.30. Because the raw correlations between individual measures of STM and cognitive ability are in the same general range (around r=.30) as measures of WM and cognitive ability, it is interesting to consider the relationship between STM and cognitive ability within an SEM framework. Researchers have used SEM techniques to examine the relations among STM, WM, and fluid intelligence (Gf). Gf is defined by Cattell (1987) and Horn and Cattell (1966) as the processing and reasoning component of intelligence (e.g., the ability associated with solving novel problems) as opposed to crystallized intelligence (Gc), which is defined as the knowledge acquired through education and experience. SEM techniques generally show that, when all relationships are estimated simultaneously, the relationship between WM and Gf is large and significant and that the relationship between STM and Gf is small or nonsignificant (Conway, Cowan, Bunting, Therriault, & Minkoff, 2002; Engle, Tuholski, Laughlin, & Conway, 1999). For example, in a study by Conway et al., the structural coefficient from STM to Gf was .18 compared to a structural coefficient of .60 from WM to Gf. Similarly, Engle et al. found a structural coefficient of .59 from WM to g and a nonsignificant negative relationship between STM and g ( .13). One major contribution of these studies is that they include simultaneous estimation of relationships between the three constructs (as opposed to examining WM and g and STM and g in separate analyses). However, these studies are relatively limited in their assessment of the constructs they examine. For example, in both studies, only two tests were used as indicators of Gf; the Raven’s Progressive Matrices (Raven’s; Raven, Court, & Raven, 1977) and Cattell’s Culture Fair Intelligence Test (CFIT; Cattell, 1973). The indicators for WM and STM were also relatively limited (i.e., three or four measures were used as indicators of these constructs). This is potentially problematic because when very few, relatively narrow measures are used to identify a construct, there is a risk of misrepresenting the relations among the constructs being studied because of specific test variance. For example, although the Raven’s and CFIT are generally fairly highly related to Gf factors, test-specific variance that might influence the relationships between these tests and some WM tests (and other spatial tests of Gf) would include components of spatial ability, inductive reasoning, and processing speed (see Burke, 1958). A better approach for sampling the construct space would be to include heterogeneous items and tests (within the limits of the definition of the construct in question) to control for the effect of unwanted variance (see Ackerman & Humphreys, 1991 and Humphreys, 1985 for a review of this topic). Another related consideration for maximizing the prediction of any criterion is Brunswik Symmetry (Wittmann & Su¨g, 1999). That is, matching the breadth of the predictor with that of the criterion. For example, because g is by definition a broad construct, a battery of measures that is also broad will likely be a better measure of the construct space, and will help maximize the correlation between predictor and criterion. Conversely, a battery of measures that is relatively narrow, even if the measures have high fidelity in their measurement of the construct, will be mismatched in breadth to g and the correlation between predictor and criterion will be lower. Unfortunately, most researchers are limited by practical matters, such as time and resources when conducting studies, and are thus unable to include the breadth of tests necessary for an ideal sampling of the construct space. Fortunately, data sets do exist that provide breadth of measurement, heterogeneous tests, and large sample sizes. The purpose of this paper is to present an examination of the relationship
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
611
between immediate memory and g by reanalyzing data presented in previous studies (Christal, 1959; Kelley, 1964) within an SEM framework.
2. Method Many existing studies include measures of span memory and cognitive ability (i.e., see Ackerman et al., in press; Mukunda & Hall, 1992 for reviews) and thus provided the opportunity to examine the relationship between these two constructs using existing data. One approach would be to use SEM techniques with meta-analytically derived data (e.g., see Viswesvaran & Ones, 1995). While this method is illustrative, there are problems with this approach that threaten the validity of the results. For example, in meta-analytically derived correlation matrices, there are usually an unequal number of observations underlying most correlations. In addition, meta-analytically derived correlation matrices often include cells that have very few or no observations, requiring the researcher to impute values for SEM purposes (see Burke & Landis, 2003 and Viswesvaran & Ones, 1995 for a discussion of the issues inherent in this type of analysis). Regardless of these shortcomings, this method is nonetheless informative and has been used to examine the relationship between memory test performance and g (see Ackerman et al., in press). In this study, we wanted to use a different approach. According to Slavin (1986), one problem with meta-analytic techniques is that the studies that are identified for inclusion in the analysis are often selected in a mechanistic manner. That is, researchers may not fully evaluate the quality of the findings in terms of the validity of the measures used in each study. The studies included in the meta-analysis may not be as relevant to the question at hand as studies specifically designed to examine it. Slavin recommended using a bbest evidence synthesisQ approach— an approach that requires the researcher to select studies that provide the best evidence for a question, as opposed to synthesizing a potentially diverse literature with meta-analytic techniques. In this context, we set out to identify research that would provide the best evidence for describing the relationship between intelligence and immediate memory. The criteria we used to evaluate possible studies were the following: (1) the study had to include a relatively large sample, and (2) the study needed to include a relatively large battery of ability and span memory measures. We wanted a large and diverse battery of memory and ability measures to be able to more effectively identify the different constructs underlying the measures (Humphreys, 1985; Wittmann & Su¨g, 1999). While there were a few options for our analysis, two studies potentially provide the best evidence for the relationship between ability and span memory in terms of the breadth of both memory and general ability tests included, and the size of participant samples, one by Christal (1959) and the other by Kelley (1964). Other studies met the criteria described above (e.g., Brown, Guilford, & Hoepfner, 1966; Tenopyr, Guilford, & Hoepfner, 1966). However, the Christal and Kelley analyses included tests that were more clearly identified as measuring cognitive ability and immediate memory constructs than these other studies (i.e., most measures loading on only one factor as opposed to across multiple ability and memory factors). 2.1. Kelley’s (1964) and Christal’s (1959) studies of memory abilities Kelley (1964) conducted a large-scale study with 442 Air Force Cadets (men) between the ages of 19 and 27 (M age=21.6) to identify the factors underlying a battery of 27 memory measures. Kelley
612
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
hypothesized that four underlying factors would be identified with the 27 span memory tests: (1) Rote Memory, which required simple recall or recognition of verbal, spatial, or numerical information; (2) Meaningful Memory included mainly paired associates tests; (3) Span Memory tests were designed to test a participant’s memory for order—participants were presented with a series of stimuli and were required to reproduce the sequence; and (4) Visual Memory tests required either reproduction of visual designs or recall/recognition of information presented on maps. Kelley (1964) also used a battery of ability measures as reference tests. Originally, the ability tests were included because Kelley was interested in the sources of variance that would account for performance on the memory tests. As reported by Kelley, there were 13 ability tests included, all chosen from the U.S. Air Force Airman Classification Battery, detailed in U.S. Army Air Forces Aviation Psychology Program Research Reports by Cook (1947), Guilford (1947), and Melton (1947). Tests were selected to tap mechanical knowledge, verbal, numerical, and spatial abilities. Like Kelley (1964), the purpose of Christal’s (1959) study was to identify the factors underlying a battery of memory measures. Christal hypothesized five major memory factors, including Visual Imagery (memory through use of visual imagery), Incidental Memory (memory for things/events incidentally encountered), Delayed Recall (ability to recall things after a lapse in time), Memory for Temporal Position (memory for order within a series), and Memory for Content (memory for words and memory for numbers). Seventeen memory tests were included to identify the factors listed above. Christal also administered 14 ability reference tests that assessed Perceptual Speed, Math, Mechanical Knowledge, General Knowledge, and Verbal ability factors. Participants were 718 new Air Force enlisted personnel (M age=19.35 years, S.D.=2.32). Through exploratory factor analysis (EFA), Kelley (1964) found evidence for three separable immediate memory factors (although different from the measures he had hypothesized): Rote Memory (including tests of recognition and paired associates), Meaningful Memory (including memory for limericks, memory for related words, etc.), Span Memory [letter span (visual), letter span (auditory), number span (visual), number span (auditory), and memory for instructions]. Kelley noted that these memory factors tended to include both visual and auditory content, and verbal and numeric content, without separable modality and/or content factors. Finally, Kelley noted that evidence of a separable Visual (shape or map) memory factor was equivocal. Kelley restricted his identification to first-order orthogonal factors, and thus did not consider the presence of higher-order or correlated factors. Christal (1959) also used EFA to examine the ability and memory tests. He found eight underlying factors of memory and ability that had psychological significance (three additional factors were identified that, according to Christal, had no psychological significance). These eight factors were as follows: (1) Mechanical Experience—basically a mechanical knowledge factor, (2) Memory for Position in Space—the ability to picture an object in respect to other objects, (3) Memory for Color, (4) Memory for Position in Temporal Succession, (5) Numerical Facility, (6) Verbal Comprehension, (7) Perceptual Speed, and (8) Paired Associates Memory. In summary, Christal concluded that there were specific memory abilities for memory for color, memory for position of objects in space, and memory for order. He also concluded that remembering things/events encountered incidentally required the same ability as recalling things/events more purposefully remembered. Modern SEM techniques allow for simultaneous estimation of the relationship between the measures used and first-order factors (i.e., a measurement model) and the relationship between first-order and second-order factors (i.e., the structural model). Using this technique with the data reported by Kelley (1964) and Christal (1959) allows an examination of the relationship between immediate memory and g.
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
613
The hypothesis that motivated this analysis was that studies of immediate memory and ability that included thorough identification of the factors, would show a much greater relationship between immediate memory and g when analyzed in a SEM framework than the rc.30 correlation found in meta-analyses (Ackerman et al., in press; Mukunda & Hall, 1992). More specifically, we believed that the relationship between g and immediate memory would be large and on par with reported relationships between g and WM (i.e., between .70 and .88—the structural coefficients between g and WM reported in Ackerman et al., 2002 and Kyllonen & Christal, 1990, respectively). The confirmatory factor analyses (CFAs) and SEM analyses reported here were conducted with the correlation matrices published in the original articles (Christal, 1959; Kelley, 1964) using LISREL 8.51 (Jo¨reskog & So¨rbom, 2001).
3. Results 3.1. Christal (1959) Twenty-eight of the memory and ability tests originally used by Christal (1959) were included in the CFA. (One test, Sequence Memory, was excluded from analysis due to low communality.) The first step in the CFA was to establish the measurement model by defining the first-order factors for both ability and memory. Through modern EFA techniques, we identified five distinct factors in the Christal (1959) data set: Associative Memory (e.g., paired associates learning), Span/Recall Memory (a more general memory factor), Verbal/Educational, Perceptual Speed, PS-Complex/Math, and Mechanical Knowledge. These factors were identified in the CFA as well. Some cross-loadings were allowed between the memory and ability tests to account for similarity of content and administration of some tests. Fit of the measurement model was acceptable, v 2(333, N=718)=1081.18, pb.01, Root Mean Square Error of Approximation (RMSEA)=.059, Comparative Fit Index (CFI)=.92, Non-Normed Fit Index (NNFI)=.90, and Normed Fit Index (NFI)=.88. The second-order memory factor was identified by the Associative Memory and Span/Recall Memory factors. g was identified by the Verbal/Educational, PS-Complex/Math and Mechanical Knowledge factors. In the structural model, these factors were correlated. The model is shown in Fig. 1. Model fit was acceptable, v 2 (336, N=718)=1087, pb.01, RMSEA=.06, CFA=.92, NNFI=.91, NFI=.88. Notably, the structural coefficient between g and Simple Memory was quite substantial in this model (.71). 3.2. Kelly (1964) A similar reanalysis was conducted with data from Kelley (1964). Because of the similarity of content of some of the measures, this reanalysis was more complex than the Christal (1959) analysis described above, and was conducted in three steps. First, separate CFAs were conducted for ability and memory tests respectively to verify these factor structures. Second, to examine the relationship between g and the memory factor, a second-order CFA was conducted that combined both ability and memory tests. Third, informed by the results of the hierarchical CFA, we conducted an additional analysis in an attempt to separate the variance associated with test content (i.e., verbal, numerical, and spatial) from the g and memory factors. Each of these steps is described in more detail below.
614
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
Fig. 1. A structural equation model of cognitive ability measures and memory measures from a reanalysis of data collected by Christal (1959). g=general ability; memory=immediate memory.
The CFA examining the factor structure of the 13 ability reference tests was conducted first. Two of the tests were dropped from the analysis because of low communality. These tests were Rudder Control (a test of psychomotor ability) and Mechanical Principles (a test of mechanical knowledge). The remaining 11 tests formed three ability factors (Verbal, Numerical, and Spatial). A second-order CFA was also conducted with the second-order factor, g determined by the content factors. This model fit was adequate, v 2(38, N=442)=149.80, pb.01, RMSEA=.08, CFI=.92, NNFI=.88, NFI=.90. Next, the factor structure of the 27 memory tests was examined. Eight of the memory tests were excluded from the analysis because of low communality [Recognition (syllables), Consequences I (nonverbal, a test of selecting pictures depicting the consequences, given a condition), Map Memory II, Verbal Recall, Meaningful Memory: Picture, Meaningful Memory: Paragraph, Meaningful Memory: Number] or because they loaded across multiple factors (Sentence Completion and Sentence Span). A CFA of the remaining 19 memory tests indicated four underlying factors. We named these factors as follows: (a) Simple Span: includes Number Span I and II, Letter Span I and II, and Memory for Instructions. These tests were simple span tests requiring reproduction of a sequence of digits, letters, or the carrying out of simple instructions. (b) Memory for Content: includes Recognition (words), Memory for Syllables I and II, Memory for Words I and II, and Memory for Numbers. These tests can best be described as tests of rote memory such as paired associates tests. (c) Meaningful Memory: Consequences Test (verbal), Memory for Limericks, and Memory for Ideas. These tests required participants to remember the content and meaning of verbal material. (d) Spatial Memory: Recognition (figures), Memory for Relations, Reproduction of Visual Design, and Map Memory I and II. These tests required recall or recognition of
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
615
spatial material. The fit of this model was also adequate, v 2(147, N=442)=312.71, pb.01, RMSEA=.052, CFI=.92, NNFI=.91, NFI=.86. To examine the relationship between ability and memory, we conducted a second-order CFA and allowed the second-order factors of memory and g to correlate. This model is shown in Fig. 2. As can be seen in the figure, the relationship between the Memory factor and g is quite substantial (.83). The fit of this model was only fair, v 2(386, N=442)=840.27, pb.01, RMSEA=.053, CFI=.89, NNFI=.87, NFI=.81. The figure also shows that there are correlated residuals related to the content of the tests. For example, the error term for the memory for limericks test was significantly correlated with the error terms for all three of the tests identifying the Verbal factor (i.e., Reading Comprehension, Vocabulary, and Arithmetic). The correlated error terms based on the content of the tests (verbal, numerical, and spatial) provided some direction for further exploratory analysis with the Kelley (1964) data. Our next step was to conduct
Fig. 2. A structural equation model of cognitive ability measures and memory measures from a reanalysis of data collected by Kelley (1964). Model is includes two second-order factors and seven first-order factors. g=general ability; memory=immediate memory. RT=response time.
616
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
an additional analysis with the goal of separating the variance associated with the content of the tests from the processing (i.e., memory, general ability) used in the tests. To separate variance associated with content, a general ability factor ( g) was identified by all 11 of the ability tests and one Memory factor was identified by the 27 memory tests. Three first-order content factors were also simultaneously estimated as first-order factors in this model (Verbal, Numerical, and Spatial). We determined which tests should load on each factor by examining the content of the tests that were used (e.g., memory tests using numerical information were expected to load on the numerical factor). The fit of this model was poor v 2 (368, N=442)=1209.65, pb.01, RMSEA=.077, CFI=.79, NNFI=.75, NFI=.73. Examining the correlated error terms in this model as well as the resulting factor loadings provided insight into three changes that would potentially improve the model. First, some of the tests that Kelley referred to as the broteQ memory tests [i.e., Paired Associates (e.g., Memory for Numbers, Memory for Words) and Simple Span (e.g., Number Span, Letter Span)] did not significantly load on the content factors. These nonsignificant paths
Fig. 3. A structural equation model of cognitive ability measures and memory measures from a reanalysis of data collected by Kelley (1964). Model separates content from bprocessQ for each measure. Seven first-order factors. g=general ability; memory=immediate memory. RT=response time.
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
617
were eliminated from the model. Second, the correlated error terms for the simple span tests (Number Span I and II, Letter Span I and II, and Memory for Instructions) indicated that an additional process factor should be included in the model. Thus, we added a Simple Memory factor identified by these five tests. Third, the correlated error terms for the verbal tests revealed that model fit could be improved by separating the Verbal factor into a Simple Verbal factor and a Verbal/Knowledge factor. Of the six tests identifying the Verbal factor, two were also dependent on knowledge of aviation and navigation [Reading Comprehension (participants answered questions regarding texts on navigation, airplane instructions) and Arithmetic (a test of word problems related to aviation and navigation)]. These two tests were specified to comprise the Verbal/Knowledge factor along with the Vocabulary test. Four of these tests did not contain an element of aviation content knowledge [Vocabulary (participants select synonyms for given words), Consequences Test (participants memorize a list of conditions and consequences at study and then match consequences to conditions during test), Memory for Limericks (a test where participants study limericks and then reproduce them given the first line of the limerick), and Memory for Ideas (participants reproduce a brief story in their own words after hearing it one time)]. These tests were specified to comprise the Verbal factor. The resulting model is shown in Fig. 3. The model fit fairly well v 2(372, N=442)=735.55, pb.01, RMSEA=.048, CFI=.91, NNFI=.89, NFI=.83. Comparing the relationships between the ability and memory factors in the two models provides some insight into how test content can influence the relationship between higher-order factors. For example, when common content variance was separated (essentially partialled) from process (as in the model shown in Fig. 3), the relationship between g and memory (between .44 and .47) was essentially half the size of that shown in Fig. 2, when content was not partialled out of the factors (.83).
4. Discussion We set out to examine the relationship between immediate memory and general ability in an SEM framework. Because of the problems associated with using meta-analytic data in SEM and because metaanalytic techniques often involves selection of studies in a mechanistic manner—without much evaluation of the content or quality of the studies, we decided to use a dbest evidenceT approach. We selected two studies in the published literature that met our criteria of large samples and multiple measures for each factor, namely, the investigations by Christal (1959) and Kelley (1964). Although there were advantages to using these existing studies, we also acknowledge that, by definition, the analyses presented here are largely exploratory. However, we believe that the advantages of using such rich data sets outweighed the limitations of post hoc analysis; limitations such as the need to eliminate measures with low communality and SEM analyses that are perhaps not as clean as they might have been if planned during study design. Additionally, the studies we reexamined did not include measures of WM, so we cannot directly address the relations among WM, immediate memory, and g. Rather, our hypothesis involved a comparison of the relationship between immediate memory and g found in these studies and WM and g found in the published literature. Specifically, we predicted that, when analyzed within a SEM framework, the relationship between immediate memory and general ability would be large and on par with the relationship between WM and ability (i.e., between .70 and .88; see Ackerman et al., 2002 and Kyllonen & Christal, 1990). The results confirmed our hypothesis. Carroll’s (1993) extensive reanalysis of extant factor analytic data on memory abilities and this reanalysis of the Kelley (1964) and Christal (1959) data sets present a relatively coherent perspective on
618
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
the relationship between immediate memory and intellectual abilities. That is, immediate memory tests tend to cluster by underlying process (e.g., associative memory, span memory) and to some degree by content (at least in the domains of verbal and spatial content). When large batteries of immediate memory tests are administered to a single sample of participants, these factors are well replicated, and indicate a large association with a general intellectual ability factor. As in all SEM analyses, the relationship between the memory and ability factors reported here are a function of the tests included to measure each construct. That is, using heterogeneous tests in high bandwidth batteries of g and memory likely helped maximize the relationship between these two constructs (e.g., Humphreys, 1985; Wittmann & Su¨g, 1999) while limiting some of the influence of test-specific variance. Because of the difficulty in separating the overlapping content among estimates of general intelligence and immediate memory, it is not clear how to best characterize the association between these constructs. At one extreme, the association appears to be very large (e.g., .7 or .8), such as between immediate memory and general intelligence when content overlap is not accounted for (see Figs. 1 and 2). At the other extreme, when content overlap is accounted for among memory and general ability measures (see the second CFA for Kelley—Fig. 3), the association is more modest (between .44 and .47 in our reanalysis). It is important to note that these estimates are predicated on memory tests that only involve storage and retrieval—they are essentially single tasks without any explicit interference or simultaneous processing requirements as in WM tasks. It is also interesting to note that the relationships we found using SEM techniques revealed a relationship between immediate memory and g that is similar to relationships reported between WM and g when SEM techniques are used (e.g., Ackerman et al., 2002; Kyllonen & Christal, 1990). WM was not included in this analysis, so we cannot directly compare the relations among the three constructs, but this finding suggests that the relatively recent introduction of WM tasks as measures of intelligence may not necessarily add much to the bexplanationQ of variance in g over well-constructed measures of immediate memory. In addition, the analysis suggests that processoriented research that attempts to determine the relationship between the process and g needs to take an explicit accounting of overlapping content between process and g factors. References Ackerman, P. L., Beier, M. E., & Boyle, M. O. (2002). Individual differences in working memory within a nomological network of cognitive and perceptual speed abilities. Journal of Experimental Psychology. General, 131, 567 – 589. Ackerman, P. L., Beier, M. E., & Boyle, M. O. (in press). Working memory and intelligence: The same or different constructs? Psychological Bulletin. Ackerman, P. L., & Humphreys, L. G. (1991). Individual differences theory in industrial and organizational psychology. In M. D. Dunnette, & L. M. Hough (Eds.), Handbook of Industrial and Organizational Psychology, vol. 1 (pp. 223 – 282). Palo Alto, CA7 Consulting Psychologists Press. Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). New York7 Prentice Hall. Binet, A., & Simon, T. (1908/1961). The development of intelligence in the child. L’Anne´e Psychologique, 14, 1 – 90. Blankenship, A. B. (1938). Memory span: A review of the literature. Psychological Bulletin, 35, 1 – 25. Brown, S. W., Guilford, J. P., & Hoepfner, R. (1966). A factor analysis of semantic memory abilities: Studies of aptitudes of high-level performance (Reports from the Psychology Laboratory, University of Southern California No. 37). Los Angeles, CA: University of Southern California. Burke, H. R. (1958). Raven’s progressive matrices: A review and critical evaluation. The Journal of Genetic Psychology, 93, 199 – 228. Burke, M. J., & Landis, R. S. (2003). Methodological and conceptual challenges in conducting and interpreting meta-analyses. In K. R. Murphy (Ed.), Validity generalization: A critical review (pp. 287 – 309). Mahwah, NJ7 Erlbaum.
M.E. Beier, P.L. Ackerman / Intelligence 32 (2004) 607–619
619
Carroll, R. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York7 Cambridge University Press. Cattell, R. B. (1973). Measuring intelligence with the Culture Fair tests. Champaign, IL7 Institute for Personality and Ability Testing. Cattell, R. B. (1987). Intelligence: Its structure, growth, and action. New York7 Elsevier. Christal, R. E. (1959). Factor analytic study of visual memory. Psychological Monographs: General and Applied, 72, 1 – 24. Conway, A. R. A., Cowan, N., Bunting, M. F., Therriault, D. J., & Minkoff, S. R. B. (2002). A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence, 30, 163 – 183. Cook, S. W. (1947). Psychological research on radar observer training. Army Air Forces Aviation Psychology Program Research Reports, Report No. 12. Washington: U.S. Government Printing Office. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450 – 466. Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11, 19 – 23. Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. A. (1999). Working memory, short-term memory, and general fluid intelligence: A latent-variable approach. Journal of Experimental Psychology. General, 128, 309 – 331. Guilford, J. P. (1947). Printed classification tests. Army Air Forces Aviation Psychology Program Research Reports, Report No. 5. Washington: U. S. Government Printing Office. Horn, J. L., & Cattell, R. B. (1966). Refinement and tests of the theory of fluid and crystallized general intelligences. Journal of Educational Psychology, 57, 253 – 270. Humphreys, L. G. (1985). General intelligence: An integration of factor, test, and simplex theory. In B. B. Wolman (Ed.), Handbook of intelligence: Theories, measurement, and applications (pp. 331 – 360). New York7 Wiley. Jacobs, J. (1887). Experiments in bPrehensionQ. Mind, 12, 75 – 79. Jenkins, J. J., & Patterson, D. G. (1961). Studies of individual differences: The search for intelligence. New York7 AppletonCentury-Crofts. Jfreskog, K., & Sfrbom, D. (2001). LISREL (Version 8.51) [Computer software]. Lincolnwood, IL7 Scientific Software International. Kelley, H. P. (1964). Memory abilities: A factor analysis. Psychometric Monographs, 11. Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389 – 433. Melton, A. W. (1947). Apparatus tests. Army Air Forces Aviation Psychology Program Research Reports, Report No. 4. Washington: U. S. Government Printing Office. Mukunda, K. V., & Hall, V. C. (1992). Does performance on memory for order correlate with performance on standardized measures of ability? A meta-analysis. Intelligence, 16, 81 – 97. Psychological Corporation (1997). WAIS-III/WMS-III technical manual. San Antonio, TX7 Author. Raven, J. C., Court, J. H., & Raven, J. (1977). Raven’s progressive matrices and vocabulary scales. New York7 Psychological Corporation. Slavin, R. E. (1986). Best-evidence synthesis: An alternative to meta-analytic and traditional reviews. Educational Researcher, 15, 5 – 11. Tenopyr, M. L., Guilford, J. P., & Hoepfner, R. (1966). A factor analysis of symbolic memory abilities: Studies of aptitudes of high-level performance (Reports from the Psychology Laboratory, University of Southern California No. 38). Los Angeles, CA7 University of Southern California. Terman, L. M., & Merrill, M. A. (1937). Measuring intelligence. Boston7 Houghton Mifflin. Terman, L. M., & Merrill, M. A. (1960). Stanford–Binet Intelligence Scale: Manual for the third revision, Form L-M. Boston7 Houghton-Mifflin. Thorndike, R. L., Hagen, E. P., & Sattler, J. M. (1986). The Stanford–Binet Intelligence Scale (4th ed.). Itasca, IL7 Riverside. Viswesvaran, C., & Ones, D. S. (1995). Theory testing: Combining psychometric meta-analysis and structural equation modeling techniques. Personnel Psychology, 48, 865 – 885. Wittmann, W. W., & Sqg, H. M. (1999). Investigating the paths between working memory, intelligence, knowledge, and complex problem-solving performances via Brunswik symmetry. In P. L. Ackerman, P. C. Kyllonen, & R. D. Roberts (Eds.), Learning and individual differences: Process, trait, and content determinants (pp. 77 – 108). Washington, DC7 American Psychological Association.