Intelligence 49 (2015) 192–206
Contents lists available at ScienceDirect
Intelligence
Something more than g: Meaningful Memory uniquely predicts training performance☆ Jeffrey M. Cucina ⁎, Chihwei Su, Henry H. Busciglio, Sharron T. Peyton U.S. Customs and Border Protection, United States
a r t i c l e
i n f o
Article history: Received 7 June 2014 Received in revised form 23 January 2015 Accepted 24 January 2015 Available online 18 February 2015 Keywords: Meaningful Memory factor General mental ability Training performance Specific aptitude theory Criterion-related validity
a b s t r a c t Past research has shown that mental ability tests predict training and job performance by virtue of measuring general mental ability (g). After removing the effects of g, mental ability tests no longer predict performance, with the exception of perceptual speed and spatial abilities for certain occupations. In our study, we found evidence for a third exception: Meaningful Memory, the ability to learn and recall information that has meaningfully-related content predicts training performance (ρ = .511), even after removing the effects of g. Therefore, Meaningful Memory tests show promise in enhancing the effectiveness of mental ability tests to predict training performance. Published by Elsevier Inc.
1. Introduction Tests of general mental ability (g) are among the best predictors of how well job applicants will perform on the job and in training. The predictive value of these tests is typically measured using criterion-related validity coefficients which are the correlation coefficient between test scores and a criterion (e.g., job performance). A meta-analysis by Schmidt and Hunter (1998) demonstrated that scores on tests of g have a criterion-related validity of .51 for job performance and .56 for training performance. Similar results were found in a metaanalysis using European data (Salgado et al., 2003). Tests of g
☆ The views expressed in this paper are those of the authors and do not necessarily reflect the views of U.S. Customs and Border Protection or the U.S. Federal Government. Portions of this research have been presented at the annual conference of the Society for Industrial and Organizational Psychology (Division 14 of the American Psychological Association) in 2014. The authors would like to thank Jasmine Gray for her assistance in preparing Fig. 6. No grant funds were used to support the study; however, some costs were paid for by the organization that employed the participants. ⁎ Corresponding author at: 1400 L ST NW, 7S43 Washington, DC 202291145, United States.
http://dx.doi.org/10.1016/j.intell.2015.01.007 0160-2896/Published by Elsevier Inc.
have been found to predict a variety of different measures of job performance including supervisory ratings, supervisory rankings, job simulations, and production quantity (Nathan & Alexander, 1988). Based on a review of the literature, Deary (2001) concluded that “there is far, far more evidence for the success of the general mental ability test than any other method of selection” (p. 97). There seems to be a consensus in the literature that, with the exception of perceptual speed and spatial abilities for certain occupations, specific cognitive abilities do not have incremental validity over g (Brown, Le, & Schmidt, 2006; Hunter, 1983a,b, 1984, 1985, 1986; Olea & Ree, 1994; Ree & Earles, 1991; Ree, Earles, & Teachout, 1994; Schmidt, 1988, 2011; Schmidt & Hunter, 2004; Schmidt, Hunter, & Caplan, 1981; Schmidt, Ones, & Hunter, 1992; Thorndike, 1986). In this paper, we study the possibility of a third exception: Meaningful Memory (MM) uniquely predicts training performance. Carroll (1993) identified this factor, which involves the storage and retrieval of meaningful information comprising interconnected facts. We note that this definition has striking similarity to the processes involved in training and academic performance.
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
193
Table 1 Definition and research for each model of memory. Type of memory
Definition
Research
Sensory
• Unconscious process lasting approximately 1 s • Two types: visual (iconic) and auditory (echoic)
Short term
• Information that individuals pay attention to in the sensory memory is placed into short-term memory • Only lasts a few seconds and holds about seven (plus or minus two) items of information (Miller, 1956) • Also known as primary memory or short-term store • Retains information for long periods of time and can be retrieved at a later time, if available and accessible • Upon retrieval, information from long-term memory is placed back into short-term memory • Focuses on the meaning of information • Has a much larger capacity than short-term memory • Also known as secondary memory or long-term store
• Sperling (1960) provided evidence for iconic memory using experiments with two conditions that briefly displayed letters to subjects and immediately asked them to recall the letters after they were no longer displayed. • Baddeley (1966) presented empirical evidence for the presence of an echoic memory store. • Foster (2009) describes two common social phenomena involving echoic memory, one of which is termed the cocktail party phenomenon. • Foster (2009) provides social examples of the short-term memory store: it is often used to hold a phone number in someone's memory while dialing the number. Under the multi-store model, information can be retained in short-term memory using rehearsal or it can be encoded into long-term memory. • Different types of information can be stored in long-term memory. For example, episodic memories consist of memories about specific episodes or events that an individual has experienced (e.g., time, location, emotions experienced). • Memory about concepts, facts, and general knowledge are examples of semantic memory. Memory about skills, procedures, and carrying out different activities is known as procedural memory. • In addition, a distinction is also made between explicit memory, which includes conscious awareness of the information that is being remembered, and implicit memory, whereby past experiences impact perceptions, thoughts, and emotions without the individual having conscious awareness of the information being remembered. • Foster (2009) notes that commercial advertising often works through implicit memory • Involves four components: ▪ The phonological loop (also known as the articulatory loop) holds vocalizations in memory, including the auditory thoughts that an individual has (which are termed subvocal speech). Information can be retained in the phonological loop via rehearsal. The phonological loop both stores information and is involved in controlling articulation. ▪ The visuo-spatial sketchpad functions similarly to the phonological loop; however, it handles visual and spatial information instead of auditory information. The visuo-spatial sketchpad is used to mentally manipulate images and for other visual and spatial mental activities. ▪ Both the phonological loop and the visuo-spatial sketchpad are considered “slave systems” in that they are subordinate to and controlled by the central executive. ▪ The Central Executive is considered a “master system.” The central executive coordinates the activities of the slave systems and is responsible for controlling mental attention, especially that involving novel information. It is also responsible for much of the conscious information processing, thoughts, decisions, reasoning, and problem-solving that takes place in someone's mind. ▪ Finally, the episodic buffer is used to store and integrate information that is retrieved from long-term memory for use with other parts of working memory. • McGrew (2005, 2009a,b) has extended Carroll's model into the Cattell–Horn–Carroll (CHC) theory of intelligence, which integrates work by Horn (1976, 1978, 1985, 1988), Horn and Noll (1997), Horn and Blankson (2005), Cattell (1943, 1963, 1967a,b), and Hakstian and Cattell (1974) with that by Carroll. Under this model, two second-order factors are related to memory: Shortterm memory (Gsm) and Long-term storage and retrieval (Glr). Both of these second-order factors stem from Horn and Blankson's (2005) model and they can also be found as a reorganization of first-stratum factors in Carroll's model. • McGrew and Flanagan (1998) note that “the amount of time that lapses between the initial task performance and recall of information related to that task is not critically important in defining Glr” (p. 24).
Long term
Working memory
• Reconceptualization of short-term memory by Baddeley and Hitch (1974) • Places more emphasis on processing rather than memory stores
Distinguishing short- • McGrew and Flanagan (1998) define short-term memory as “the term and long-term ability to apprehend and hold information in immediate awareness memory and then use it within a few seconds.” Tests requiring memory that takes place beyond a few seconds are defined as tests of long-term memory. McGrew and Flanagan state “Although the terms ‘longterm’ frequently carries with it the connotation of days, weeks, months, and years in the clinical literature, long-term storage processes can begin within a few minutes or hours of performing a task” (p. 24, emphasis appears in the original). • This distinction also appears in Carroll's (1993) three-stratum model of intelligence, which separates the second-order factor of General Memory and Learning (2Y) into various memory span tests (which are tests of short-term memory) and various longterm memory tests.
(continued on next page)
194
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
Table 1 (continued) Type of memory
Definition
Research • McGrew (2009a,b) also notes that Glr can be measured in terms of “minutes, hours, weeks, or longer” (p. 6). It should also be noted that the testing literature found that tests of short- and long-term memory load on separate factors, indicating the individual differences in short-term memory are separate from individual differences in long-term memory. • The research literature on memory strongly suggest that shortterm and long-term memory are separate processes and that memory activities requiring the retention of material over the course of several minutes are examples of long-term memory and are distinct from those involving short-term memory.
Distinguishing shortterm and long-term memories
1.1. Brief overview of memory We begin with a short primer on the memory literature, largely based on descriptions provided in textbooks and review articles (e.g., Baddeley, 2004; Foster, 2009; Schwartz, 2011) (readers desiring more information on memory theory are referred to Table 1 which provides definitions and research behind each model of memory). Modern models of memory can largely be tied to work by Atkinson and Shiffrin (1968) and Baddeley et al. (e.g., Baddeley, 2001; Baddeley & Hitch, 1974). Atkinson and Shiffrin described a multi-store model of memory that parallels computer processing and data storage. Under this model, there are three main stores of memory: sensory memory, short-term memory, and long-term memory. Information from the senses (e.g., vision, hearing) is held in sensory
memory before decaying and becoming lost after about 1 s. Information that individuals pay attention to in the sensory memory is placed into their short-term memory, which lasts only a few seconds. In everyday life, people use short-term memory to hold a phone number in their mind while dialing (Foster, 2009). There is a rich history of short-term memory tests in the personnel selection literature (Verive & McDaniel, 1996). Under the multi-store model, information can be retained in short-term memory using rehearsal or it can be encoded into long-term memory, which focuses on the substantive meaning of information. Long-term memory has a much larger capacity than short-term memory and it involves three processes: encoding, storage, and retrieval (which places information back into short-term memory). Foster (2009) notes that the “distinction between short-
Table 2 Carroll's (1993) and McGrew's (2009a,b) definitions of long-term memory factors. Factor
Abbrev. Quoted description
Long-term storage and retrieval
Glr
Associative memory
MA
Free recall memory
M6
Meaningful Memory
MM
Visual memory
MV
McGrew (p. 5): “The ability to store and consolidate new information in long-term memory and later fluently retrieve the stored information (e.g., concepts, ideas, items, names) through association. Memory consolidation and retrieval can be measured in terms of information stored for minutes, hours, weeks, or longer. Some Glr narrow abilities have been prominent in creativity research (e.g., production, ideational fluency, or associative fluency).” Carroll (p. 302): “The ability to form arbitrary associations in stimulus material such that on testing, the individual can recall what stimulus is paired with another, or recognize, in a series of test stimuli, which stimuli were experienced in a study phase.” McGrew (p. 5): “Ability to recall one part of a previously learned but unrelated pair of items (that may or may not be meaningfully linked) when the other part is presented (e.g., paired-associative learning).” Carroll (p. 302): “Indicated by the fact that some individuals, after a study phase, are able in a test phase to recall more (arbitrarily unrelated) material from the study phase than others, when the amount of material to be remembered exceeds the individual's memory span.” McGrew (p. 5): “Ability to recall (without associations) as many unrelated items as possible, in any order, after a large collection of items is presented (each item presented singly). Requires the ability to encode a ‘superspan collection of material’ (Carroll, 1993, p. 277) that cannot be kept active in short-term or working memory.” Carroll (p. 302): “Indicated by the fact that some individuals, after a study phase, are able to recall (or reproduce) or recognize more material from a study phase than others, when the material in the study phase as meaningful interrelations.” McGrew (p. 5): “Ability to note, retain, and recall information (set of items or ideas) where there is a meaningful relation between the bits of information, the information comprises a meaningful story or connected discourse, or the information relates to existing contents of memory.” Carroll (p. 302): “The ability to form, in a study phase, a mental representation (or possibly an image) of visual material that is presented, when the visual material is not readily codable in some modality other than visual, and to use that representation in responding in a test phase by recognition or recall. (An analogous auditory memory factor UM is considered in Chapter 9.)” McGrew (p. 3): “Ability to form and store a mental representation or image of a visual shape or configuration (typically during a brief study period), over at least a few seconds, and then recognize or recall it later (during the test phase).”
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
term and long-term memory [is] often misunderstood both by clinicians and the lay community” (p. 24). Memories that have only lasted for minutes, hours, or days are often erroneously described as short-term memories, when in fact they are long-term memories. Short-term memory only lasts a few seconds and for information to be retained in short-term memory for longer than a few seconds, it must be constantly rehearsed in short-term memory. This makes it impractical for someone to store information in the shortterm memory for longer than a few minutes. 1.2. Individual differences in long-term memory factors Carroll (1993) lamented that “Unfortunately, there has been little interplay between experimental and psychometric approaches to these areas” (p. 248). That said, he was able to identify a number of memory factors which McGrew (2005, 2009a,b) expands upon. These tests have a study phase (when testtakers are given time to commit information to long-term memory via encoding) and a test phase (when testtakers are
195
tested for their retrieval of the information from long-term memory). The first-order factors differ in terms of the type of information that is covered; however, tests for them all load onto the Glr second-level memory factor. We provide selected quotes from Carroll and McGrew's operational definitions of these factors in Table 2. To date, no research has examined the possibility that longterm memory factors predict training performance. Glr and MM seem especially well-suited for predicting success in learning as the encoding, storage, and retrieval of information is heavily involved in the process of learning, studying, and completing examinations. However, Carroll noted that Glr and related factors have not been well-studied outside of “constrained learning situations such as those conducted in psychological laboratories” (p. 303). Four factors can be grouped under the Glr domain: free recall memory (F6), associative memory (MA), visual memory (MV), and MM. The first three factors do not appear promising for providing a unique contribution to prediction in selection settings. F6 requires testtakers to orally or verbally list
Table 3 MM tests versus trainability tests. Distinction criteria
MM tests
Trainability tests
g-Loading
• MM tests do not involve complex material that is difficult to understand • Much of the material is simple in nature and the associations between the concepts in a MM tests are clearly laid out to the testtaker • Very little reasoning involved in committing meaningful information to memory • Expected that there will be a lower g-loading in MM tests, resulting in a higher loading on the MM construct
Test material
• MM tests present fictitious information to applicants, that is not sensitive and can easily be shared with the public.
• Trainability tests for predicting academic performance will have a higher gloading, and hence less specific variance related to MM, than MM tests • Although a trainability test might involve the use of the MM construct during the test phase (especially if the study material is not available and the test is closed-book), it also requires substantial use of g to comprehend and understand the material • When summarizing the literature on trainability tests, McDaniel and Kepes (2012) noted that trainability tests have correlations with g that are “high” and adverse impact that is “comparable” to other cognitive ability tests. They argued that for more intensive training programs, trainability tests will have higher g-loadings • Trainability test for a law enforcement position would likely require substantial use of logical reasoning skills since the procedures, regulations, and laws, as well as the legal system in general, relies heavily on logic • One of the benefits of trainability tests is their use of actual job-related materials from training, which results in strong content validity. However, in an operational selection setting, presenting actual law enforcement training academy materials to applicants presents a security risk in that much of the material taught in the academy is deemed sensitive • Difficult to use as experimental tests in a concurrent validation study since incumbents have already been in training and have had prior exposure to the training materials that would be used to develop the trainability test • There is evidence dating back to the original memory studies by Ebbinghaus (1885) and replicated nearly 100 years later (Bahrick & Phelps, 1987) that it is easier to re-learn material that was learned at an earlier time. Thus, the task of learning training material a second time, in a trainability test, is not equivalent to the task of learning it the first time. • There is also empirical evidence that having expertise in a given topic area promotes subsequent learning of new information in that topic area (Bransford, Brown, & Cocking, 2004; Chase & Simon, 1973; Chi, Feltovich, & Glaser, 1981; deGroot, 1965; Foster, 2009). For this reason, trainability tests might present issues for an operational testing program if some of the applicants have an unfair advantage on the trainability test by virtue of past experience with the content of the training materials • Testtakers may be provided access to the information presented during the test phase. • Information presented may include learned motor skills (e.g., sewing) that involves the use of procedural knowledge and physical abilities.
Use as an • All of the material that an applicant studies is new and fictiexperimental tious, meaning that the applicant has had no prior exposure to test the study material • Since the MM test contents are fictitious, a test developer can easily develop material that the testtaker has not had prior exposure to
Explicit inclusion of a memory component
• Testtakers are not provided access to information presented during the test phase. • Information presented and recalled involves explicit use of declarative knowledge and mental abilities.
196
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
Fig. 1. This is a depiction of the associative network underlying a pairing from the study material for a picture–number memory test (e.g., the ETS kit MA1 test). The two boxes represent nodes for “chair” and the number “16”, and the line represents a link. This figure presents a model for how the information from the item is encoded in long-term memory.
unrelated items that were presented earlier; such a test is difficult to objectively score and use in a group-administered mass testing environment. MA consists of paired association
between two individual nouns (e.g., pictures, numbers, first and last names); however, Carroll noted a “general failure of factor MA to predict learning performance” and concluded that “there is little evidence that the factor [MA] makes any significant independent contribution to the prediction of school learning performance in general” (pp. 273–274). Similarly, Carroll did not report strong evidence for the MV factor stating that it cannot “be clearly identified” and he questioned “if it exists” (pp. 280–282). He found that much of the variance in MV tests is associated with his General Memory and Learning (2Y) factor (which McGrew, 2005, 2009a,b divides into Glr and Gsm). Although MA and MV tests do not appear to uniquely predict performance, they are good measures of the Glr factor. MM tests involve a study phase and test phase; however, rather than studying randomly created items, testtakers are asked to study material that is meaningful in nature. The material may depict a story, an idea, or a “connected discourse”
Intelligence Tip The New York Police Department (NYPD) has learned about a money smuggling operation and is requesting help from Customs. A group of drug dealers in New York City are taking U.S. $100 bills from drug sales and shipping them to Switzerland. Once in Switzerland the money is placed into Swiss bank accounts where it cannot be traced by law enforcement. The next set of $100 bills are being shipped by airfreight on AirSwitzerland Flight 37 which leaves tomorrow at 7 PM. The money is in five boxes that are supposed to only contain books. Customs must seize the boxes.
Fig. 2. This is an example story for the Meaningful Memory, which was developed for this study.
Fig. 3. This is a depiction of the associative network underlying the story presented in Fig. 2. It presents a model for how the information from the story is encoded in long-term memory. Each box represents a node and each line represents a link.
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
(Carroll, 1993, p. 277). Early examples include The Marble Statue (Garrett, Bryan, & Perl, 1935; Shaw, 1896; Whipple, 1915), a limerick-based test by Kelley (1964), and a videobased MM test (Heckman, 1967; Seibert, Reid, & Snow, 1967). More recent examples are present in clinical intelligence tests (Kaufman & Kaufman, 1993; Woodcock & Johnson, 1989) and one employment test (Canadian Border Services Agency, 2006). MM tests are somewhat similar in concept to trainability tests. A trainability test consists of a short instructional period, followed by some measure of learning, which is used as a predictor of future training performance (Casey, 1984; Reilly & Manese, 1979; Robertson & Downs, 1979; Roth, Buster, & Bobko, 2011; Smith & Downs, 1975). The instruction period typically consists of a sample of the material that is presented in training, which would take place after an applicant is hired. However, in Table 3, we outline several distinctions between them, including gloading (trainability tests require the use of g to comprehend training material in addition to being able to recall it), the novelty of information presented, and the explicit inclusion of memory. MM and MA tests can be distinguished using associative networks (Anderson, 1983; Carlston & Smith, 1996; Collins & Loftus, 1975; Fiske & Taylor, 1991; Kunda, 2001; Smith, 1998). In MA, only one link between the nodes for the two items is explicitly provided (see Fig. 1 for an example). MM tests present a much richer network with multiple linkages between multiple nodes. For example, consider the story presented in Fig. 2 and the network that underlies it in Fig. 3. We suggest that MM tests measure the ability of testtakers to link the nodes together and form a network and to later retrieve information from that network (which is encoded in long-term memory).
197
short, we hypothesized that MM would be an ability that would correlate with performance based on information collected during a job analysis of the position. The job analysis included job observations, incumbent interviews, subject matter expert [SME] panels, a large-scale survey in which incumbents rated different knowledge areas, skills, and abilities (KSAs), and linkage ratings between KSAs and the duties comprising the position. Next, we developed a measure of MM (using information from the intelligence and memory literatures and SME panels). The MM test was then administered to incumbents and the scores were correlated with those on the training performance criterion. Corrections for incidental range restriction and criterion unreliability were then applied1.
2. Methods 2.1. Participants Participants were 256 gun-carrying Federal law enforcement officers (LEOs) who took a written test battery as applicants, were subsequently hired, and later attended a training academy. The participants were then selected to participate in a criterion-related validation study that included the MM test. Most of the participants (approximately 72%) were male, which is typical of most LEO incumbents. Approximately 35% where Hispanic, 14% where Black, 13% where Asian/Pacific Islander, and 38% were White. The participants were drawn from over 20 cities and were paid their normal salary when participating in the study.
1.3. Overview of study An MM test was developed and administered to a sample of law enforcement officers who also completed three marker memory tests for the Glr factor. Operational data for other cognitive measures (e.g., logical reasoning) and training performance were also available. Using these data, it was possible to extract factors for g, Glr, and MM and correlate these factors with training performance. We make the following hypotheses: Hypothesis 1. g will predict training performance. (This is hypothesized based on work by Schmidt & Hunter, 1998.) Hypothesis 2. The MM test will have a significant bivariate correlation with training performance. Hypothesis 3. The MM factor will predict training performance uniquely (i.e., above and beyond g). Hypothesis 4. The Glr factor will predict training performance uniquely (i.e., above and beyond g). Throughout the study, we followed the typical approach for developing and validating standardized employment tests (see Gatewood & Field, 2001, for more information). In
1 These types of corrections are common in personnel selection and educational testing research and have previously been used in Intelligence (Verive & McDaniel, 1996). Range restriction is most often observed when a criterion-related validation study is intended to yield insights into the validity of a test for an applicant population but is conducted using data from incumbent employees or current students (who were selected using test scores). Range restriction can occur when the same test is used for selection and validation (i.e., explicit or direct range restriction) or when two different, but correlated, tests are used for selection and validation (i.e., incidental or indirect range restriction). Gulliksen (1950, chapter 11) provides mathematical proofs for several range restriction formulae. Range restriction is often described in psychometric texts (Crocker & Algina, 1986; Ghiselli, Campbell, & Zedeck, 1981; Guion & Highhouse, 2006; Nunnally & Bernstein, 1994; Thorndike, 1949) and corrections for range restriction are typically made in large-scale and metaanalytic research on employment and educational tests (Kobrin, Patterson, Shaw, Mattern, & Barbuti, 2008; Kuncel & Hezlett, 2007; Kuncel, Hezlett, & Ones, 2001, 2004; Schmidt & Hunter, 1998). Similarly, corrections for criterion unreliability are often conducted in applied validation research and are described in the textbooks and articles cited above. Corrections for criterion unreliability estimate what the value of the criterion-related validity coefficient would be if true scores on the criterion were available. Whenever the criterion reliability is less than 1, the maximum possible validity coefficient is less than 1 (Nunnally & Bernstein, 1994). The correction for unreliability estimates what the validity would be if the criterion were perfectly reliable; Shen, Cucina, Walmsley, and Seltzer (2014) show that when corrected for criterion unreliability, the corrected validity estimate is identical to the percentage of the maximum possible validity coefficient that the obtained validity represents (i.e., the obtained validity coefficient divided by the maximum possible validity coefficient).
198
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
2.2. Measures 2.2.1. MM test A job analysis indicated that memory was a core competency for the position being studied. To make the test look as face-valid as possible, the authors led a discussion on the role of memory in performing the position during a subject matter expert (SME) panel. Examples of the types of information that incumbents must commit to memory were collected and were used to create a draft test (consisting of a study booklet and of
five types of items). The draft test was then reviewed by a second SME panel and a panel of senior and management-level Personnel Research Psychologists. The study booklet for the test included “All Points Bulletins,” consisting of photographs and descriptions of wanted persons and items (e.g., vehicles, packages, documents) as well as “tips/ alerts,” consisting of leads on future criminal activities. Since entry-level applicants would not have job knowledge or experience with legal terminology, technical terms and material were not included in the test content. Press releases on the
Fig. 4. Example All Points Bulletin — traveler printout.
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
199
Fig. 5. Example All Points Bulletin — vehicle printout.
agency's website were also used to gather ideas (e.g., locations, types of crimes) for the test content. For all of the material, the information presented was fictitious and photographs of objects (e.g., vehicles, packages) were obtained by the authors or from public-domain sites (including the agency's website and intranet as well as Microsoft Office Clip Art). Sample materials are provided in Figs. 4 and 5. A series of five-option multiple choice questions (each with only one correct option) were written covering the material in the study booklet. 2.2.2. Written test battery When applying for their positions, all participants completed tests of logical reasoning (Colberg, 1984, 1985; Colberg, Nester, & Trattner, 1985), Mathematical Reasoning, and Writing Skills. These tests were used as markers for g.
2.2.3. ETS marker tests Nine tests for memory, mental visualization, and perceptual speed from the ETS Kit of Factor-Referenced Cognitive Tests (Ekstrom, French, Harman, & Derman, 1976a,b) were administered. The three memory tests (MA1: Picture– Number; MV1: Shape Memory; MA3: First and Last Names) were used as markers for Glr. The three mental visualization tests (CF2: Hidden Patterns; S2: Cube Comparisons; VZ-1: Form Board) and the three perceptual speed tests (P3: Identical Pictures; P2 Number Comparisons; P1: Finding As) were used as additional markers for g. 2.2.4. Training performance criterion Participants completed a rigorous basic academy training program that provided scores on written, end-of-training
−.034
– .284⁎⁎ .229⁎⁎ .155⁎ .081 .280⁎⁎
– .417⁎⁎ .300⁎⁎ .132⁎
.065 .365⁎⁎ .417⁎⁎ – .391⁎⁎ .108 .007 .138⁎ – .490⁎⁎ .365⁎⁎ .471⁎⁎ .215⁎⁎ .194⁎⁎ .291⁎⁎ .048 .137⁎ .147⁎ .198⁎⁎ .295⁎⁎ .069 .212⁎⁎ .127⁎ .143⁎ .427⁎⁎
.072 .144⁎ .185⁎⁎ .159⁎ .329⁎⁎
.099 .218⁎⁎ .288⁎⁎ .114 .061 .219⁎⁎
.060 .091 .502⁎⁎
.025 .365⁎⁎
– .350⁎⁎ .237⁎⁎ .065 .170⁎⁎ .089 .082 .457⁎⁎
.021 .414⁎⁎ – .377⁎⁎ .362⁎⁎ .321⁎⁎ .192⁎⁎ .198⁎⁎ .143⁎ – .414⁎⁎ .482⁎⁎ .301⁎⁎ .318⁎⁎ .180⁎⁎ .158⁎
– .145⁎ .021 .133⁎ .186⁎⁎
– .276⁎⁎ .293⁎⁎ .186⁎⁎ .283⁎⁎ .258⁎⁎ .140⁎
– .617⁎⁎ .179⁎⁎ .257⁎⁎ .232⁎⁎ .266⁎⁎ .197⁎⁎ .169⁎⁎
– .328⁎⁎ .320⁎⁎ .272⁎⁎ .237⁎⁎ .172⁎⁎ .326⁎⁎ .203⁎⁎ .213⁎⁎
.79 .83 .55 .63 .82 .85 .69 .89 .77 .81 .84 .76 .74 .85 5.71 9.18 7.60 6.91 3.16 3.59 3.85 55.34 9.19 42.77 14.54 14.16 16.34 4.66 36.45 23.88 14.20 16.98 14.35 12.14 20.02 180.42 11.83 94.67 65.29 46.35 62.73 87.99 1. MM test 2. ETS Picture–Number MA-1 3. ETS First & Last Names MA-3 4. ETS Shape Memory MV-1 5. Logical reasoning 6. Mathematical reasoning 7. Writing 8. ETS Hidden Patterns CF-2 9. ETS Cube Comparisons S-2 10. ETS Mental Visualization VZ-1 11. ETS Identical Pictures P-3 12. ETS Number Comparison P-2 13. ETS Finding A's P-1 14. Training performance
Notes. N = 256. The reliabilities were obtained from either test manuals or the largest available dataset containing scores on each test. To explain, the KR-20 for the Meaningful Memory test was from the validation dataset. The KR20s for the Logical Reasoning, Mathematical Reasoning, and Writing tests came from applicant data from a 2011 administration of the tests (N = 7387 to 14,455). For the ETS marker tests, reliabilities are medians of the reliabilities reported in the User's Manual (except for MV-1, Shape Memory, where the estimate shown was obtained using the correlation between Part One and Part Two scores, r = .384, and the Spearman–Brown Prophecy for doubling test length provided in Formula 6‐19 of Nunnally & Bernstein, 1994). ⁎ p b .05 ⁎⁎ p b .01
−.034 .142⁎ – – .142⁎
.081 .007 .229⁎⁎ .224⁎⁎
.108 .284⁎⁎ – .224⁎⁎
.061 .091 .025 .082 .194⁎⁎
.114 .060 .143⁎ .089 .215⁎⁎ .132⁎ .099 .318⁎⁎ .321⁎⁎ .237⁎⁎ .490⁎⁎
.069 .048 .072 .218⁎⁎ .180⁎⁎ .192⁎⁎
14 13
.143⁎ .198⁎⁎ .159⁎ .127⁎ .147⁎ .185⁎⁎
12 11 10
.212⁎⁎ .137⁎ .144⁎ .288⁎⁎ .158⁎ .198⁎⁎ .170⁎⁎ .471⁎⁎ .300⁎⁎ .391⁎⁎ .213⁎⁎ .169⁎⁎ .140⁎
9 8
.203⁎⁎ .197⁎⁎ .258⁎⁎ .186⁎⁎ .301⁎⁎ .362⁎⁎ .350⁎⁎ .326⁎⁎ .266⁎⁎ .283⁎⁎ .133⁎ .482⁎⁎ .377⁎⁎
7 6 5
.320⁎⁎ .617⁎⁎ .328⁎⁎
4 3 2 1 Rel. SD Mean
Table 4 presents descriptive statistics and intercorrelations. After an item-analysis, the MM test consisted of 48 items and had a KR-20 of .793. A somewhat lower internal consistency was expected given multidimensionality of the constructs intended to be measured in the MM test (i.e., general mental ability, Glr, MM). The hypotheses were tested using bivariate correlations, multiple regression, and structural equation modeling (SEM). Consolidated results for the criterion-related validity of MM are provided in Table 5. Hypothesis 1 stated that “g will predict training performance.” Scores on the first unrotated principal component obtained from all of the cognitive ability served as a measure of g and had a criterion-related validity of .539 (p b .001). Structural equation modeling was also used to estimate the relationship between the latent construct of g and training academy performance. We found that a bifactor model which included factors for g, mental
Table 4 Descriptive statistics and intercorrelations of research variables.
3. Results
.172⁎⁎ .232⁎⁎ .186⁎⁎
The data were collected as part of a criterion-related validation study for personnel selection instruments used by the organization. The MM and ETS marker tests were included in the study as experimental measures. The design of the study was conceived by the authors, with input and approval from management officials within the organization that employed the participants. No grant funds were used to support the study; however, the direct (e.g., travel and supplies) and indirect (e.g., salary) costs were paid for by the sponsoring organization as part of the criterion-related validation study. Management officials in the sponsoring organization approved the dissemination of the results presented in this paper. The data analyses, interpretation, and write-up of the study were conducted solely by the authors and reflect their professional viewpoints (and not necessarily those of the sponsoring organization). The written test battery and training performance criterion were collected in proctored group settings under applicant/operational conditions. The MM and ETS marker tests were administered by proctors during a research-based concurrent validation study. Participants were given 20 min to independently and silently study the study booklet. After the study period, the proctors collected the study booklets and administered a set of 44 biodata items for 20 min. These items served as an intervening task, which “flushes” the short-term memory store of information from the study booklet (see McGrew & Flanagan, 1998). Participants were then given 30 min to answer the 50 multiple-choice items in the question booklet.
.237⁎⁎ .257⁎⁎ .293⁎⁎ .145⁎
2.3. Procedure
.272⁎⁎ .179⁎⁎ .276⁎⁎
examinations, covering classroom instruction on the various aspects of operations, law, and compliance. The average of these examinations, obtained from archival sources, served as the criterion. After the study was completed, the academy was accredited for granting undergraduate course credits to trainees who successfully completed the training program. There were no substantive changes in the training criterion before and after the accreditation; thus this criterion could also be viewed as a measure of undergraduate academic performance.
.427⁎⁎ .295⁎⁎ .329⁎⁎ .219⁎⁎ .502⁎⁎ .365⁎⁎ .457⁎⁎ .291⁎⁎ .280⁎⁎ .138⁎ .155⁎
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
Variable
200
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
201
Table 5 Consolidated results of analyses showing relationship between MM test/factor and training performance. Description of analysis
Predictor Type of test statistic
Test statistic
p
Observed bivariate validity coefficient Validity coefficient corrected for range restriction Validity coefficient corrected for criterion unreliability and range restriction Hierarchical regression analysis showing incremental validity over existing testsb Hierarchical regression analysis showing incremental validity over existing testsb Hierarchical regression analysis showing incremental validity over existing testsb correcting for range restriction and criterion unreliability Hierarchical regression analysis showing incremental validity over existing testsb correcting for range restriction and criterion unreliability True Score Hierarchical regression analysis showing incremental validity over existing testsb, after correcting for predictor reliability for all existing tests True Score Hierarchical regression analysis showing incremental validity over existing testsb, after correcting for predictor reliability for all existing tests Hierarchical regression analysis showing incremental validity over g using PCAc Hierarchical regression analysis showing incremental validity over g using PCAc Hierarchical regression analysis showing incremental validity over g using PAFc Hierarchical regression analysis showing incremental validity over g using PAFc SEM bi-factor Model A with path from MM factor to the criterion.
MM test MM test MM test MM test MM test MM test
r r r, ρov ΔR2 β ΔR2
.427 .472 .511 .068 .278 .069
b.001 N/Aa N/Aa b.001 b.001 N/Aa
MM test
β
.289
N/Aa
MM test
ΔR2
.077
N/Aa
MM test
β
.311
N/Aa
MM test MM test MM test MM test MM factor MM test
ΔR2 β ΔR2 β Standardized SEM regression weight Standardized SEM regression weight Standardized SEM regression weight Standardized SEM regression weight Standardized SEM regression weight Standardized SEM regression weight Standardized SEM regression weight
.029 .200 .031 .205 .185
.001 .001 .001 .001 .003
.180
.002
.211
N/Aa
.256
N/Aa
.179
.004
.295
b.001
.244
b.001
SEM bi-factor Model B with path from MM test to the criterion SEM bi-factor Model A using correlation matrix corrected for range restrictiond SEM bi-factor Model B using correlation matrix corrected for range restriction
MM factor MM test
SEM bi-factor Model C with paths from g, Glr, MM, MV, and PS to the criterion.
MM test
SEM higher-order version of Model A
MM factor MM test
SEM higher-order version of Model B
Notes. a It is not appropriate to run statistical significance tests on corrected coefficients (Schmidt & Hunter, 1996). b The existing tests include logical reasoning, math reasoning, and writing. c g was estimated using the first component or factor based on a PCA or PAF of the 13 cognitive tests. d In order to avoid Heywood cases, the error variances of MA1 and MA3 were fixed to the values implied by their reliabilities.
Hypothesis 2 stated that the “MM test will have a significant bivariate correlation with training performance.” As shown in Table 5, the correlation between scores on the MM test and training performance was .427 (p b .001; .511 corrected). Thus, there is support for Hypothesis 2.
visualization, perceptual speed, and Glr provided the best fit. The standardized path coefficient from g to training academy performance was .693 (p b .001) after correcting for measurement error in the training academy grades. These two analyses both provide support for Hypothesis 1.
Table 6 Hierarchical regression showing incremental validity of MM test over g. Step
Variables
R
Training performance as criterion Principal components analysis 1 .539 g 2 .565 g MM test Principal axis factoringa 1 g 2 g MM test
.539
R2
Adj. R2
ΔR2
.290
.288
.290
β
F of Δ
df1
df2
p
103.120
1
252
10.656
1
251
b.001 b.001 .001 b.001 .001
103.327
1
252
11.524
1
251
.539 .319
.314
.029 .434 .200
.291
.288
.291 .539
.567
.322
.317
.031 .434 .205
b.001 b.001 .001 b.001 .001
Note. g: Estimate for g using either principal axis factor scores or principal component scores; MM test: Meaningful Memory test. All statistics involving the incremental validity of the MM test are in bold font. N = 256. a Although we focus on the use of principal component scores in the text, we also provide the results that were obtained using the first unrotated principal factor scores.
202
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
Hypothesis 3 stated that “the MM factor will predict training performance uniquely (i.e., above and beyond g).” A hierarchical linear regression was conducted (see Table 6) with the g scores entered as predictors in the first step and the MM test scores entered in the second step. The MM test added incremental validity over g scores (ΔR2 = .029) with an MM β-weight of .200 (p b .001). Schmidt et al. (1981), Schmidt and Hunter (1996, scenario 6), and Schmidt (2011) recommend conducting a true score regression procedure in this situation. As shown in Table 5, the MM test added slightly more incremental validity over the existing tests in this analysis (ΔR2 = .077) and it also had a slightly larger β-weight (.311) after correcting the correlation matrix for measurement error in all predictors. Using SEM, a path was added from the MM factor to the observed variable for the training performance criterion. The MM factor was identified by fixing the uniqueness of the MM test to the value implied by its reliability and including three factor loadings (g, Glr, and MM). Adding a path from MM to the
criterion had statistically significant incremental fit statistics indicating that it provided a better fit than a model without this path (Δχ2 = 9, df = 1, p = .003) and Model 8 (Δχ2 = 10, df = 2, p = .007). This indicates that adding a path from the MM factor to the criterion increases the fit. As shown in Table 5, this path had a statistically significant standardized regression weight (.185, p = .003); this means that after controlling for g and Glr, the correlation between the MM factor and training performance is .185. After correcting for range restriction, the standardized regression weight increased from .185 to .211. We note that nearly identical results were obtained when we omitted the MM latent factor and instead included a path from the MM test to training performance. Furthermore, conceptualizing g as a higher-order construct resulted in poorer fit statistics, but a larger standardized regression weight from the MM factor to training performance: .295. The incremental validity of the Glr over g was investigated using SEM. A path was added from the Glr factor (which was
Glossary of Abbreviations Used in Figure. Abbreviation χ2 df p χ2/df GFI AGFI NFI CFI PGFI AIC CAIC RMSEA TLI L M W CF2 S2 VZ1 P3 P2 P1 MA1 MV1 MA3 MF MV PS Glr MM g examavg
Definition Chi-squared Degrees of freedom p-value Chi-square divided by degrees of freedom Goodness of Fit Index Adjusted Goodness of Fit Index Normed Fit Index Comparative Fit Index Parsimony Goodness of Fit Index Akaike Information Criterion Corrected Akaike Information Criterion Root Mean Square Error of Approximation Tucker-Lewis Index Logical Reasoning Test Math Reasoning Test Writing Test ETS kit Hidden Patterns Mental Visualization Test ETS kit Cube Comparisons Mental Visualization Test ETS kit Form Board Mental Visualization Test ETS kit Identical Pictures Perceptual Speed Test ETS kit Number Comparison Perceptual Speed Test ETS kit Finding A’s Perceptual Speed Test ETS kit Picture-Number Memory Test ETS kit Shape Memory Test ETS kit First and Last Names Memory Test Meaningful Memory test Mental Visualization Factor Perceptual Speed Factor Long-Term Storage And Retrieval Factor Meaningful Memory Factor General Mental Ability Factor Training Performance Criterion (examination average)
Fig. 6. Structural equation model with all paths from all latent variables to training performance.
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
χ2 143
203
df p χ2/df GFI AGFI NFI CFI PGFI AIC CAIC RMSEA TLI 63 <.0001 2.3 0.924 0.874 0.841 0.901 0.555 227 418 0.071 0.857
5.574 Lerr
1
L
8.680 Merr
M
.663
8.797 Werr
W .635
1561.590 CF2err
CF2
39.802 S2err
.401 .502
S2
-.086
1209.628 VZ1err
P3
PS
P1
.387
.079 .621 .414 .414
39.233
Glr
.618
MA1
.208
MV1
.233
.179
42.113 MV1er
16.683 Ma3er
examavg
-.282
.220
.267
MA1er
.686
g
P2
233.994 P1err
examavg
.291
113.730 P2err
8.749
.481
VZ1
150.018 P3err
MV
.543
MA3
6.850 MFerr
MF
MM
.750
Fig. 6 (continued.)
defined using loadings on the three ETS memory tests and the MM test) to the training performance criterion. This path had a non-significant standardized regression weight (.076, p = .241). Furthermore, adding the path did not provide incremental fit over a model without the path (Δχ2 = 1, df = 1, p = .257). Thus, the Glr construct is not significantly related to training performance and Hypothesis 4 is not supported. An SEM model that included paths from all latent factors to training performance is shown in Fig. 6. Note that only g and MM had a
significant relationship with training performance that was in the hypothesized direction. 3.1. Additional analyses The MM test had little evidence of adverse impact for Hispanics (d = .07) and the average scores for females were somewhat higher than males (d = .37). Gulliksen and Wilks (1950) statistical tests of predictive bias/fairness using Whites
204
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
and Hispanics indicated that the MM test was fair2. The difference between the standard errors of estimate for the two groups was non-significant (χ2 = 0.012, df = 1, p = .913) as was the difference between the slopes (χ2 = 0.122, df = 1, p = .727). The intercept term approached statistical significance (χ2 = 3.589, df = 1, p = .058) with a tendency for overprediction for the minority group. Our sample was limited in that there were too few Blacks/African Americans to examine adverse impact or fairness. An anonymous reviewer asked if the three memory marker tests (i.e., ETS Picture–Number MA-1, ETS First & Last Names MA-3, and ETS Shape Memory MV-1) had additional predictive power (just as the MM test did). We tested this possibility using two separate three-step hierarchical regression analyses. In the first analysis, scores for g (i.e., the first unrotated principal component from all of the cognitive ability tests) were entered in step 1, followed by the MM test in step 2, and the three remaining memory tests in step 3. None of the three remaining memory tests had statistically significant regression weights and the ΔR2 was .001 (p = .945). In the second analysis, scores for g were entered in step 1, followed by the three remaining memory tests in step 2, and the MM test in step 3. The ΔR2 for step 2 was .001 (p = .971) and the ΔR2 for step 3 was .029 (p = .001). These results suggests that only the MM test (i.e., not the remaining three memory tests), adds additional predictive power. 4. Discussion As a whole, we found support for Hypothesis 3 indicating that the MM factor does in fact uniquely predict training performance. This is an important and notable finding, since past research has consistently failed to find evidence that specific abilities have incremental validity over g. We do not view the finding that MM uniquely predicts training performance as a confirmation of specific aptitude theory, instead we view it as the third exception to its disconfirmation. It is interesting that the MM factor, and not the Glr factor, predicted training performance. This suggests that MM is a more 2 Psychometricians predominantly use regression-based procedures for defining and examining predictive bias/fairness. Under this definition of fairness, a test is unfair if the regression lines depicting the relationship between scores on the test and scores on the performance criterion are different for the majority and minority groups. For example, if the regression line had a positive slope for the majority group and was completely flat for the minority group, then unfairness would occur. Fairness occurs when the regression lines for two groups (e.g., majority and minority) are either identical or occur in such a way that the minority group is not disadvantaged (e.g., the use of a common regression line over-predicts the performance of minority group members). The procedure used in the current study (Gulliksen & Wilks, 1950) is conducted in three steps, each involving the comparison of regression outcomes for the two groups. In step 1, the standard errors (SEs) of the estimate are compared; in step 2, the slopes are compared; in step 3, the intercepts are compared. If any step yields a statistically significant result, the model testing stops at that step and further steps are neither conducted nor interpreted. For example, if the first step shows a significant difference in the standard errors, then steps 2 and 3 are not conducted. Likewise, if the second step shows a significant difference in the slopes, then step 3 is not conducted. The Gulliksen and Wilks statistical test is the most comprehensive test of predictive bias/ fairness as it includes tests for differences in standard errors, slopes, and intercepts. The Cleary (1968) model omits tests for standard error differences and analyses that compare the magnitude of validity coefficients omit differences in intercepts.
important determinant of training performance and declarative knowledge than other types of long-term memory. This could be expected since the training material consisted of interrelated ideas dealing with laws, legal concepts, regulations, and operational procedures, rather than randomly-paired bits of information. It could be the case that positions requiring memory for randomly paired pieces of information that are non-meaningfully related might have a stronger relationship with Glr (e.g., restaurant wait-staff and some customerservice positions might require incumbents to commit orders to memory). One limitation of this study is that it only included one job; thus our findings may not generalize to training programs for other jobs. Indeed, previous research on specific abilities suggests that the two abilities that uniquely predict performance (i.e., spatial abilities and perceptual speed) do so only for a subset of jobs. One avenue for future researchers would be to replicate our findings in other training programs or for other criteria (e.g., measures of job or academic performance). Future researchers could also examine the validity of MM in traditional academic settings. A large number of admissions testing programs exist for selecting applicants to undergraduate, graduate, and professional schools. Neither of the two main undergraduate admissions tests used in the United States (i.e., the SAT and ACT) include a MM test, nor do any of the seven graduate/professional school admissions tests contained in Kuncel and Hezlett's (2007) meta-analysis. There are some similarities between the training academy program in our study and traditional academic programs. Performance in both settings is measured, in part, using scores on classroom tests which involve the use of memory, especially the encoding, storage, and recall of meaningful information. Finally, our findings that females have higher scores on the MM test (which uniquely predicts performance) suggests that it could serve as an omitted variable that could help explain the tendency for admissions tests to underpredict the academic performance of females (Kling, Noftle, & Robins, 2013; Mattern & Patterson, 2013; Zwick, 2006). Past research suggests that g-loaded tests tend to underpredict academic performance of females (i.e., on average, female students tend to receive higher grades than would be predicted by their g-loaded test scores). This suggests that an additional variable(s) predicts academic performance and that females have a higher mean score on this variable than males; the MM factor could be a candidate for such a variable. Recent research suggests that conscientiousness and course-taking patterns might be partial (but not complete) explanations for female underprediction (Keiser, Sackett, & Brothen, 2014; Kuncel, Keiser, & Sackett, 2014), leaving open the possibility that the MM explains the underprediction. References Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence, & J. T. Spence (Eds.), The psychology of learning and motivation, Vol. 2. (pp. 89–195). New York: Academic Press. Baddeley, A. D. (1966). Short-term memory for word sequences as a function of acoustic, semantic, and formal similarity. Quarterly Journal of Experimental Psychology, 18(4), 362–365. Baddeley, A. D. (2001). The concept of episodic memory. Philosophical Transactions of the Royal Society of London B, 356, 1345–1350.
J.M. Cucina et al. / Intelligence 49 (2015) 192–206 Baddeley, A. D. (2004). The psychology of memory. In A. D. Baddeley, M. D. Kopelman, & B. A. Wilson (Eds.), The essential handbook of memory disorders for clinicians. Hoboken, NJ: John Wiley & Sons. Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. A. Bower (Ed.), Recent advances in learning and motivation, Vol. 8. (pp. 47–89). New York: Academic Press. Bahrick, H. P., & Phelps, E. (1987). Retention of Spanish vocabulary over 8 years. Journal of Experimental Psychology: Learning, Memory, & Cognition, 13, 344–349. Bransford, J. D., Brown, A. L., & Cocking, R. R. (2004). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press. Brown, K. G., Le, H., & Schmidt, F. L. (2006). Specific aptitude theory revisited: Is there incremental validity for training performance. International Journal of Selection and Assessment, 14(2), 87–100. Canadian Border Services Agency (2006). Border services officer test information booklet: Version 1C. Ottawa, ON: Human Resources Branch, Canadian Border Services Agency. Carlston, D. E., & Smith, E. R. (1996). Principles of mental representation. In E. T. Higgins, & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles (pp. 184–210). New York, NY: Guilford Press. Carroll, J. B. (1993). Human cognitive abilities: A survey of factor analytic studies. New York: Cambridge University Press. Casey, J. J. (1984). Trainability diagnosis: A humanistic approach to selection. Training and Development Journal, 38(12), 89–91. Cattell, R. B. (1943). The measurement of adult intelligence. Psychological Bulletin, 40, 153–193. Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational Psychology, 54, 1–2. Cattell, R. B. (1967a). La theorie de l'intelligence fluide et cristallisee sa relation avec les tests “culture fair” et sa verification chez les enfants de 9 a 12 ans [The theory of fluid and crystallized intelligence, its relation with “culture fair” tests and its verification with children of 9 to 12 years]. Revue de Psychologie Appliquée, 17, 134–154. Cattell, R. B. (1967b). The theory of fluid and crystallized general intelligence checked at the 5–6 year-old level. British Journal of Educational Psychology, 37, 209–224. Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 1, 33–81. Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152. Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115–124. Colberg, M. C. (1984). Towards a taxonomy of verbal tests based on logic. Educational and Psychological Measurement, 44, 113–120. Colberg, M. C. (1985). Logic-based measurement of verbal reasoning: A key to increased validity and economy. Personnel Psychology, 38, 347–359. Colberg, M., Nester, M. A., & Trattner, M. H. (1985). Convergence of the inductive and deductive models in the measurement of reasoning abilities. Journal of Applied Psychology, 70, 681–694. Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82, 407–428. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart and Wilson. Deary, I. J. (2001). Intelligence: A very short introduction. New York, NY: Oxford University Press, Inc. deGroot, A. D. (1965). Thought and choice in chess. The Hague, The Netherlands: Mouton. Ebbinghaus, H. (1885). Über das Gedächtnis: Untersuchungen zur experimentellen Psychologie [Reprinted and translated in 1964 as Memory: A contribution to experimental psychology]. Gloucester, Mass: Smith (New York: Dover). Ekstrom, R. B., French, J. W., Harman, H. H., & Derman, D. (1976a). ETS Kit of factor-referenced cognitive tests. Princeton, NJ: Educational Testing Service. Ekstrom, R. B., French, J. W., Harman, H. H., & Derman, D. (1976b). Manual for Kit of factor-referenced cognitive tests. Princeton, NJ: Educational Testing Service. Fiske, S. T., & Taylor, S. E. (1991). Social cognition. New York, NY: Random House. Foster, J. K. (2009). Memory: A very short introduction. Oxford, UK: Oxford University Press. Garrett, H. E., Bryan, A. I., & Perl, R. E. (1935). The age factor in mental organization. Archives of Psychology, 176, 396–413. Gatewood, R. D., & Field, H. S. (2001). Human resource selection (5th ed.). New York, NY: Harcourt College Publishers. Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. New York, NY: W.H. Freeman and Company. Guion, R. M., & Highhouse, S. (2006). Essentials of personnel assessment and selection. New York, NY: Psychology Press. Gulliksen, H. (1950). Theory of mental tests. New York, NY: John Wiley & Sons, Inc.
205
Gulliksen, H., & Wilks, S. S. (1950). Regression tests for several samples. Psychometrika, 15, 91–114. Hakstian, A. R., & Cattell, R. B. (1974). The checking of primary ability structure on a basis of twenty primary abilities. British Journal of Educational Psychology, 44, 140–154. Heckman, R. W. (1967). Aptitude–treatment interactions in learning from printed-instruction: A correlational study. Unpublished Ph.D. thesis, Purdue University. (University Microfilm 67–10202). Horn, J. (1976). Human abilities: A review of research and theory in the early 1970s. Annual Review of Psychology, 27, 437–485. Horn, J. L. (1978). Human ability systems. In P. B. Baltes (Ed.), Life-span development and behavior, Vol. 1. (pp. 211–256). New York: Academic. Horn, J. L. (1985). Remodeling old models of intelligence. In B. B. Wolman (Ed.), Handbook of intelligence (pp. 267–300). New York: Wiley. Horn, J. L. (1988). Thinking about human abilities. In J. R. Nesselroade (Ed.), Handbook of multivariate psychology (pp. 645–685). New York: Academic Press. Horn, J. L., & Blankson, N. (2005). Foundations for better understanding of cognitive abilities. In D. P. Flanagan, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 41–68) (2nd ed.). New York: Guilford Press. Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf–Gc theory. In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests and issues (pp. 53–91). New York: Guilford. Hunter, J. E. (1983a). A causal analysis of cognitive ability, job knowledge, job performance, and supervisor ratings. In F. Landy, S. Zedick, & J. Cleveland (Eds.), Performance measurement and theory (pp. 257–266). Hillsdale, NJ: Erlbaum. Hunter, J. E. (1983b). The prediction of job performance in the military using ability composites: The dominance of general cognitive ability over specific aptitudes. Report for Research Applications, Inc., in partial fulfillment of DOD Contract F41689-83-C-0025. Hunter, J. E. (1984). The validity of the Armed Forces Vocational Aptitude Battery (ASVAB) high school composites. Report for Research Applications, Inc., in partial fulfillment of DOD Contract F41689-83-C-0025. Hunter, J. E. (1985). Differential validity across jobs in the military. Report for Research Applications, Inc., in partial fulfillment of DOD Contract F41689-83C-0025. Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior, 29, 340–362. Kaufman, A. S., & Kaufman, N. L. (1993). The Kaufman Adolescent and Adult Intelligence Test. Circle Pines, MN: American Guidance Service. Keiser, H. N., Sackett, P. R., & Brothen, T. (2014). Revisiting female underprediction: Exploring the role of conscientiousness. Poster presented at the 29th meeting of the Society for Industrial and Organizational Psychology, Honolulu, HI. Kelley, H. P. (1964). Memory abilities: A factor analysis. Psychometric Monographs, 11. Kling, K. C., Noftle, E. E., & Robins, R. W. (2013). Why do standardized tests underpredict women's academic performance? The role of conscientiousness. Social Psychology and Personality Science, 4(5), 600–606. Kobrin, J. L., Patterson, B. F., Shaw, E. J., Mattern, K. D., & Barbuti, S. M. (2008). Validity of the SAT® for predicting first-year college grade point average. Research Report No. 2008-5. New York, NY: The College Board. Kuncel, N. R., & Hezlett, S. A. (2007). Standardized tests predict graduate students' success. Science, 315, 1080–1081. Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2001). A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate student selection and performance. Psychological Bulletin, 127, 162–181. Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career potential, creativity, and job performance: Can one construct predict them all? Journal of Personality and Social Psychology, 86, 148–161. Kuncel, N. R., Keiser, H. N., & Sackett, P. R. (2014). Revisiting female underprediction: Exploring the role of course-taking patterns. Poster presented at the 29th meeting of the Society for Industrial and Organizational Psychology, Honolulu, HI. Kunda, Z. (2001). Social cognition: Making sense of people. Cambridge, MA: The Massachusetts Institute of Technology Press. Mattern, K. D., & Patterson, B. E. (2013). Test of slope and intercept bias in college admissions: A response to Aguinis, Culpepper, and Pierce (2010). Journal of Applied Psychology, 98(1), 134–147. McDaniel, M. A., & Kepes, S. (2012). Spearman's hypothesis is a model for understanding alternative g tests. In L.M. Hough (Chair). Racial differences in personnel selection: Complex findings and ongoing research. Paper presented at the 27th Annual Conference of the Society for Industrial and Organizational Psychology. San Diego. McGrew, K. S. (2005). The Cattell–Horn–Carroll theory of cognitive abilities: Past, present, and future. In D. P. Flanagan, & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 136–181) (2nd ed.). New York, NY: Guilford.
206
J.M. Cucina et al. / Intelligence 49 (2015) 192–206
McGrew, K. S. (2009a). Cattell–Horn–Carroll (CHC) broad and narrow cognitive ability definitions. Third working draft dated March 11, 2009. Available at: http://www-personal.umich.edu/~itm/688/wk6/CHC%20Definitions.pdf McGrew, K. S. (2009b). Editorial. CHC theory and the human cognitive abilities project. Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1–10. McGrew, K., & Flanagan, D. (1998). The Intelligence Test Desk Reference (ITDR). Gf–Gc cross-battery assessment. Boston, MA: Allyn & Bacon. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic investigation. Personnel Psychology, 41, 517–535. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York, NY: McGraw-Hill, Inc. Olea, M. M., & Ree, M. J. (1994). Predicting pilot and navigator criteria: Not much more than g. Journal of Applied Psychology, 79, 845–851. Ree, M. J., & Earles, J. A. (1991). Predicting training success: Not much more than g. Personnel Psychology, 44(2), 321–332. Ree, M. J., Earles, J. A., & Teachout, M. (1994). Predicting job performance: Not much for than g. Journal of Applied Psychology, 79, 518–524. Reilly, R. R., & Manese, W. R. (1979). The validation of a minicourse for telephone company switching technicians. Personnel Psychology, 32, 83–90. Robertson, I., & Downs, S. (1979). Learning and the prediction of performance: Development of trainability testing in the United Kingdom. Journal of Applied Psychology, 64(1), 42–50. Roth, P. L., Buster, M. A., & Bobko, P. (2011). Updating the trainability tests literature on Black–White subgroup differences and reconsidering criterion-related validity. Journal of Applied Psychology, 96, 34–45. Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., de Fruyt, F., & Rolland, J. P. (2003). A meta-analytic study of general mental ability validity for different occupations in the European community. Journal of Applied Psychology, 88(6), 1068–1081. Schmidt, F. L. (1988). The problem of group differences in ability test scores in employment selection. Journal of Vocational Behavior, 33, 272–292. Schmidt, F. L. (2011). An interview with Frank L. Schmidt. The Industrial Organizational Psychologist, 48(4), 21–29. Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods, 1(2), 199–223. Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262–274.
Schmidt, F. L., & Hunter, J. E. (2004). General mental ability in the world of work: Occupational attainment and job performance. Journal of Personality and Social Psychology, 86(1), 162–173. Schmidt, F. L., Hunter, J. E., & Caplan, J. R. (1981). Validity generalization results for two job groups in the petroleum industry. Journal of Applied Psychology, 66(5), 261–273. Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel selection. Annual Review of Psychology, 43, 627–670. Schwartz, B. L. (2011). Memory: Foundations and applications. Thousand Oaks, CA: Sage Publications, Inc. Seibert, W. F., Reid, J. C., & Snow, R. E. (1967). Studies in cine-psychometry II: Continued factoring of audio and visual cognition and memory. West Lafayette, IN: Purdue University Audio Visual Center (ERIC Document No. 019 877). Shaw, J. C. (1896). A test of memory in school children. The Pedagogical Seminary, 4(1), 61–78. Shen, W., Cucina, J. M., Walmsley, P., & Seltzer, B. (2014). When correcting for unreliability of job performance ratings, the best estimate is still. 52. Industrial and Organizational Psychology: Perspectives on Science and Practice7. (pp. 519–524), 519–524. Smith, E. R. (1998). Mental representations and memory. In D. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), (4th ed.). Handbook of social psychology, Vol. 1. (pp. 391–445). . New York, NY: McGraw-Hill. Smith, M. C., & Downs, S. (1975). Trainability assessments for apprentice selection in shipbuilding. Journal of Occupational Psychology, 48, 39–43. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied. 74. (pp. 1–29), 1–29 (11 Whole No. 498). Thorndike, R. L. (1949). Personnel selection: Test and measurement techniques. New York, NY: John Wiley & Sons, Inc. Thorndike, R. L. (1986). The role of general ability in prediction. Journal of Vocational Behavior, 29, 332–339. Verive, J. M., & McDaniel, M. A. (1996). Short-term memory tests in personnel selection: Low adverse impact and high validity. Intelligence, 23, 15–32. Whipple, G. M. (1915). Manual of mental and physical tests: II. Baltimore, MD: Warwick & York. Woodcock, R. W., & Johnson, M. B. (1989). Woodcock–Johnson Psychoeducational Battery — Revised. Chicago, IL: Riverside. Zwick, R. (2006). Higher education admissions testing. In R. L. Brennan (Ed.), Educational measurement (4th ed.). Westport, CT: American Council on Education & Praeger Publishers.