Intelligence 35 (2007) 347 – 358
Consideration of g as a common antecedent for cognitive ability test performance, test motivation, and perceived fairness☆ Charlie L. Reeve a,⁎, Holly Lam b a
Department of Psychology, University of North Carolina Charlotte, 9201 University City Boulevard, Charlotte, NC 28223-0001, United States b Purdue University, United States Received 21 August 2006; accepted 21 August 2006 Available online 2 October 2006
Abstract Several different analyses were used to test the hypothesis that test-taking motivation, perceived test fairness, and actual test performance are correlated only because they share a common antecedent. First, hierarchical regressions reveal that initial test performance has a unique influence on non-ability factors even after controlling for test-takers' initial standing on those factors. Second, the variance underlying performance was partitioned into g and non-g components. Partial correlations show that the nonability factors are essentially unrelated to performance after controlling for variance in g. Third, a correlated vectors analysis showed that the degree of g-saturation of the cognitive ability scales was significantly correlated with the magnitude of the correlations between performance on the scale and the non-ability variables. Lastly, only the g-variance underlying initial test performance predicted changes in motivation and fairness perceptions across repeated testing, but initial motivation and fairness perceptions did not influence changes in performance. It is concluded that, as suggested by the self-serving bias hypothesis, the relations between non-ability factors and test performance are largely due to a common antecedent, namely g. © 2006 Published by Elsevier Inc. Keywords: General cognitive ability; Self-serving bias hypothesis; Test attitudes; Test-taking motivation; Perceived test fairness
Despite a voluminous empirical literature from basic differential research confirming that professionally developed intelligence tests, or the factor scores derived from such tests, are free from measurement bias (Gottfredson, 1997; Jensen, 1980, 1998; Reeve & Hakel, 2002; Schmitt, 2002), there appears to be a surge in interest among industrial–organizational psychologists as to whether test-taker reactions and motivational differences might contribute significantly to the observed variance in performance on cognitive ☆ Special thanks to Silvia Bonaccio for her helpful comments regarding the analyses reported in this paper. ⁎ Corresponding author. E-mail address:
[email protected] (C.L. Reeve).
0160-2896/$ - see front matter © 2006 Published by Elsevier Inc. doi:10.1016/j.intell.2006.08.006
ability tests. Of particular concern to test users and selection researchers is the possibility that differences in test-taker perceptions or motivation to perform on standardized tests might result in biased or inaccurate estimates of qualifications, especially in high-stakes contexts (Barber, 1998; Ryan & Ployhart, 2000; Smither Reilly, Millsap, Pearlman, & Stoffey, 1993). These concerns appear to be driven by studies reporting significant correlations between observed test scores and non-ability factors such as attitudes towards tests (e.g., Arvey, Strickland, Drauden, & Martin, 1990; Smither et al., 1993) and test-taking motivation (e.g., Sanchez, Truxillo, & Bauer, 2000). Concern regarding the construct validity of cognitive ability tests extends back to the emergence of standardized
348
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
testing (e.g., Spearman, 1904) and, thus, the idea that factors other than the target ability might influence performance on a test is certainly not without precedent. However, studies of the relations between test-taker reactions and test performance are often ambiguous with regard to causal processes (e.g., Smither et al., 1993). That is, many of these studies leave open the possibility that differences in reactions and motivation may be a consequence of the individuals' performance rather than antecedent. However, basic research from the social and differential literatures can help clarify these issues. The motivations for this research are multiple. First, from a global perspective, we believe there can be mutual benefit by better integrating the research findings from basic differential psychology into industrial-organizational research on testing. For example, if non-ability factors do substantially influence performance on the cognitive ability tests typically used for selection purposes, then existing models of the latent sources of variance that account for test performance within the differential literature may need to be modified and their validity reexamined. On the other hand, if attitudes and reported motivations do not substantially influence test performance, and rather reflect only spurious correlations due to a common antecedent, some of the concern surrounding applicant reactions would be reduced (however, they may still be very important for other reasons). Second, we seek to confirm and extend the central premise underlying Chan's (Chan, 1997; Chan, Schmitt, DeShon, Clause, & Delbridge, 1997; Chan, Schmitt, Jennings, Clause, & Delbridge, 1998; Chan, Schmitt, Sacco, & DeShon, 1998) self-serving bias hypothesis; that is, we use a multi-panel design to assess if and to what degree test-takers' motivation and perceptions of test fairness uniquely contribute to the observed variance in test performance beyond that due to general cognitive ability. Specifically, we (a) explicitly examine the possibility noted by Chan et al. (1997) for the observed relation among test reactions and test performance; namely, that general cognitive ability (i.e., g) is a common antecedent; and (b) to integrate the recent work on test-taking motivation into the self-serving bias framework. Finally, although not our central purpose, given the multi-panel design, we also investigate the degree to which g and non-g factors explain changes in test-taking motivations and fairness perceptions after receiving performance feedback information. 1. The self-serving bias framework First, drawing on the universal positive attribution bias – that people are more likely to attribute positive
outcomes to personal factors whereas they will attribute negative outcomes to external factors (e.g., Heider, 1958, 1976; Miller & Ross, 1975; Nisbett & Ross, 1980; Whitley & Frieze, 1985) – Chan (Chan, 1997; Chan et al., 1997; Chan, Schmitt, Jennings et al., 1998; Chan, Schmitt, Sacco et al., 1998) suggests that test performance may be a key determinant of test-taking reactions (i.e., beliefs about tests, fairness perceptions, test-taking motivations). Poor-performing individuals reduce ego threat by developing negative perceptions of the test and reporting low motivation, whereas high ability individuals engage a self-enhancing mechanism, evaluating the use of the test more favorably and reporting higher motivation. For example, in a field study, Chan, Schmitt, Jennings et al. (1998) hypothesized that applicant perceptions of the job relatedness of a test would have a positive effect on their perceptions of test fairness, as predicted by the work of Gilliland (1993, 1994). In addition, actual test performance was hypothesized to have a positive effect on both job relatedness and test fairness, but these effects would be mediated by perceived test performance. In other words, relative to those who performed poorly on the test, applicants who performed well were more likely to perceive that they had performed well, which in turn, would have a direct effect on their reactions to the procedures and test. As predicted by the self-serving hypothesis, applicants who had actually done well and believed they had done well perceived the selection test to be more job-relevant and fair than applicants who did poorly on the selection test. Moreover, their study design allowed them to support the conclusion that test performance influences applicant reactions. In a similar study, Bauer, Maertz, Dolen, and Campion (1998), assessed justice perceptions and test self-efficacy before and after applicants received feedback about whether or not they received a passing score on the written selection test. Specifically, ratings of the amount of information known about the test and treatment at the test site were positively related to testtaking self-efficacy for those who passed the test, but negatively related to test-taking self-efficacy for those who did not pass. Their findings partially support the hypothesis that test performance can influence applicant reactions after controlling for initial attitudes. However, this study again fails to account for the role of the test taker's general reasoning ability as a common antecedent to both performance and non-ability factors. Rather, Bauer et al. (1998) were concerned with whether applicant attitudes influenced important outcomes after controlling for initial attitudes. Left untested in the
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
empirical literature is whether these attitudes significantly impact subsequent performance after controlling for general reasoning ability, rather than initial attitudes. In a study of racial subgroup differences, Chan et al. (1997) found that test performance had a significant relation to post-test face validity perceptions and reported motivation. Further, Chan et al.'s findings suggested that post-test perceptions of face validity impact reports of motivation which influence subsequent performance; however, this effect was small after controlling for prior performance. Although the work of Chan and colleagues has started to clarify some of the causal issues, strong inferences are still problematic. For instance, Chan, Schmitt, Sacco et al. (1998) note that observed relations between test performance and test-taker reactions (both pre and post) could be found for a number of reasons: reactions developed prior to or during the test influenced performance, perceived or actual performance influenced reactions, or some omitted third variable is a common cause for both (p. 473). A number of studies have attempted to address the first two possibilities (e.g., Chan et al., 1997; Chan, Schmitt, Jennings et al., 1998), but the possibility that the relation between test-taker reactions (i.e., fairness perceptions and motivations) and test-performance is a spurious correlation due to a common antecedent has not been explicitly investigated. However, given the substantial evidence accumulated in the basic differential literature indicating that general reasoning ability (g) is perhaps the most dominant and pervasive individual difference factor associated, either causally or correlationally, with a wide range of academic, occupational, and, importantly, attitudinal and social outcomes (Gordon, 1997; Gottfredson, 1997; Jensen, 1998; Lubinski & Humphreys, 1997), it would not be unreasonable to hypothesize that g is largely responsible for both test performance and, via the selfserving bias, test reactions. Moreover, Chan et al. note that a substantial portion of the variance in performance was accounted for by the unmeasured variable(s) associated with racial subgroup membership. Though seldom acknowledged in the industrial-organizational literature, the well-document Spearman-Jensen effect (i.e., that the size of the racial difference on any cognitive ability test is directly proportional to the g-saturation of the test; Jensen, 1985, 1987, 1998; Nijenhuis & van der Flier, 2003; Rushton, 1998, 2001; Rushton & Jensen, 2003; however, see Dolan, Roorda, & Wicherts, 2004, for an exception) suggests that Chan's “unmeasured variable” may likely be g. The term “g-saturation” refers to the degree that a specific test measures g; that is, the
349
proportion of observed test variance that is due to the g factor. The more a test is g-saturated (e.g., inductive or deductive reasoning tests), the larger the associated racial difference; the less it is g-saturated (e.g., rote memory, clerical checking) the smaller the associated racial difference. This association is reliably strong, averaging about r = 0.60 across 149 different tests (Jensen, 1998, chapter 11). Further, this association has been shown to hold even when the tests' gsaturations are estimated in a Japanese sample and the Black–White differences are estimated in American samples (Jensen, 1998), as well as in a sample of 3-year old children (Peoples, Fagan, & Drotar, 1995). Furthermore, although Chan's discussion of the selfserving bias hypothesis explicitly discusses test-taker motivations, only one study has included a measure of motivation (Chan et al., 1997) and that measure has questionable psychometric properties (cf., McCarthy & Goffin, 2003). Despite the seeming importance of understanding how test-taking motivation works within the context of the self-serving bias, subsequent research has focused exclusively on justice and fairness reactions. At the same time, significant progress has been made in the conceptualization and measurement of testtaking motivation; however, this research has remained independent of investigations of the self-serving bias hypothesis. Integrating these two domains would go a long way toward satisfying critics who lament that test performance is highly susceptible to non-ability factors. 2. Test-taking motivation With respect to recent, theoretically grounded approaches to test-taking motivation, the work of Sanchez et al. (2000) is perhaps most notable. Their model and corresponding measure of measure of test taking motivation based on the V.I.E. motivational framework (Vroom, 1964). In an initial development study, applicants for an entry-level police officer position responded to the Valence Instrumentality Expectancy Motivation Scale (VIEMS) immediately following completion of a written selection exam. Participants were asked to respond to questions regarding the attractiveness of getting the job (i.e., valence of the outcome); the degree to which doing well on the selection test will lead to being hired (i.e., perceived instrumentality); and the subjective probability of effort leading to successful performance (i.e., perceived expectancy). The results of their field study indicated that the Expectancy scale was positively related to test scores (β = 0.17, p b 0.01), Instrumentality was negatively related to test scores (β = − 0.17,
350
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
p b 0.01), and Valence did not account for much variance in the hierarchical regression (β = 0.07, p N 0.05). While the work of Sanchez and colleagues has been informative for understanding test-taking motives, it is as yet unclear the extent to which these motives explain test performance variance above and beyond general cognitive ability. That is, applying the same premises underlying the self-serving bias hypothesis regarding fairness perceptions to test-taking motivation, it is possible that test-takers' motivations are influenced by their general cognitive ability. Test-takers who have higher ability are more likely to engage in ego-enhancing activities (e.g., report higher expectancy to perform well, claim to value the outcomes more) whereas those with low ability are likely to engage in ego-protecting activities (e.g., report lower expectancy, claim to not value the outcomes, and may even doubt the instrumentality of the test). Thus, initial test-taking motivation would appear to be correlated with performance, but this correlation may be spurious due to a common antecedent. In fact, in a second field study, Sanchez et al. (2000) assessed test-taking motivation both before (Time 1) and after (Time 2) participants completed a written selection test. Regression analyses revealed that when the VIEMS subscales were entered as a block, Time 1 responses were not related to test score, but the amount of variance accounted for in test performance by Time 2 VIEMS subscales was significant, R2 = 0.03, F(3, 242) = 2.50, p b 0.10. These results indicate that applicants' perceived test performance was significantly related to post-test motivation (Time 2), even after partialling out pre-test motivation (Time 1). Consequently, these results suggest that test-taking motivation is mostly a consequence of test performance rather than an antecedent. However, left unchecked is the degree to which the correlation between performance and post-test motivation is due specifically to general ability. 3. The current study A primary limitation of the extant literature investigating the relationship between test reactions and ability test performance is that general ability has not been taken into account. Thus, the current study uses a multipanel design to test the degree to which non-ability factors can impact observed performance, controlling for prior differences in general cognitive ability. Additionally, this design allows us to examine the degree to which test performance impacts subsequent motivation and fairness perceptions while controlling for prior differences in these factors, as well as the
degree to which changes in non-ability reactions can be predicted by prior ability and non-ability factors. Based on existing evidence regarding the soundness and robustness of the psychometric properties of cognitive ability tests in the face of differences in test reactions (e.g., Schmitt, 2002), practice effects (e.g., Reeve & Lam, 2005), age (at least from 15 to 19 years of age; e.g., Reeve, 2004) as well as racial, cultural, ethnic, and nationality factors (e.g., Carroll, 1993; Irvine & Berry, 1988; Jensen, 1985, 1998; Jensen & Reynolds, 1982), and the emerging evidence supporting the extension of the self-serving bias to the test-taking context (e.g., Chan, 1997; Chan, Schmitt, Jennings, et al., 1998; Chan, Schmitt, Sacco, et al. 1998), we propose a series of hypotheses consistent with a strong version of the selfserving bias hypothesis. That is, for the purpose of generating and testing specific hypotheses, we take the position that in the maximum performance context of cognitive ability testing, g is the primary antecedent of test performance, test motivation, and fairness perceptions. If so, this would give rise to spurious correlations between performance and non-ability test reactions. First, it is expected that performance on cognitive ability tests (specifically, the g-factor extracted from the battery of tests) will show significant zero-order correlations with the set of test reactions. However, if we are correct about the role of g as a common antecedent, then after controlling for the variance due to g, the set of test reactions will not be significantly related to actual test performance at Time 1 or Time 2. In other words we are proposing a classic “third variable” scenario where g acts as the common antecedent variable that explains the relation between two observed variables. If g is a common antecedent, we hypothesize that it should leave a distinctive footprint in the data. Specifically, if we separate the g and non-g variance underlying test performance, the non-ability variables should be significantly and positively correlated with the g-variance (Hypothesis 1), but not the non-g variance (Hypothesis 2). Similarly, we should find that the degree to which the non-ability variables correlate with performance on a given sub-scale is directly proportional to the g-saturation of the scale. That is, we should see positive correlations between the ability scales' vector of g-loadings and their vector of correlations with each nonability variable (Hypothesis 3). Finally, as noted, the secondary purpose of the current study is to examine the predictors of changes in test-taking motivation, fairness perceptions and performance. Although several studies have examined the relations among these constructs over time, few have assessed the degree to which changes in them are a
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
function of general ability. However, we can make a few tentative hypotheses regarding changes in motivation and fairness perceptions. First, if a test-taker does well on the test at Time 1, they are likely to engage in egoenhancing mechanisms. Thus, they should be more likely to increase reports of motivation (especially performance expectancy) and report that the test is a fair assessment of ability. On the other hand, lower scoring test-takers are more likely to engage in ego-protecting activities. That is, they will reduce their reported motivation and likely perceive the test as a less fair measure (e.g., “my low score is not reflective of my ability because this test didn't give me an honest chance to demonstrate my ability”). Thus, initial performance should predict changes in motivation and fairness perceptions (Hypothesis 4). Although one might initially think that positive perceptions and test-taking motivation should also predict changes in performance, our central argument would suggest that such relations are only due to g. Thus, because we have predicted that performance drives changes in the non-ability factors and not vice versa, it follows that non-ability factors would not have any unique predictive power regarding changes in performance (Hypothesis 5). 4. Method 4.1. Sample Undergraduate students enrolled in an introductory psychology course at a large Midwestern university participated in this study for course credit. A total of 154 participants began the study, but only 136 completed both sessions. The composition of the operational sample (N = 136) was 56.6% male, 50.0% freshmen, with an average age of 19.86 years (SD = 3.55). T-tests for mean differences on all day 1 measures of perceptions, motivation, and performance revealed no differences between those who completed session one only and those who completed both sessions. 4.2. Procedures The study consisted of two administrations of six scales of the Employee Aptitude Survey, two memory tests, and questionnaires assessing fairness perceptions and test-taking motivation (measures described below). During two separate testing sessions, each participant received a test-booklet containing all questionnaires and tests. Prior to the first session, informed consent was obtained and the experimenter explained that a cash incentive was being offered to the three participants who
351
obtained the highest overall score on the ability tests (computed as the percentage of items answered correctly across all tests within a single session). Given the large number of tests in the current study, we considered it necessary to have additional incentives for participants to allocate and maintain attention for the duration of the study. Further, the use of a monetary prize as incentive, as well as the amounts used, to simulate high-stakes testing situations is consistent with the procedures employed in previous research (e.g., Schmit & Ryan, 1992; Chan et al., 1997; McCarthy & Goffin, 2003). The incentives were $24 for the highest score, $20 for the second highest score, and $15 for the third highest score (although we wanted to use higher amounts to maximize motivation, these values were set after extensive negotiations with the university's Institutional Review Board). Participants were told that the purpose of the study was to assess the reliability and validity of an existing test of intellectual abilities and skills for the possible use as a predictor of academic performance in college. Both sessions were conducted in the same manner with only the ordering of the EAS scales changing to control for test-order effects. Participants were run in multiple small groups ranging in size from 3 to 8. For each group, sessions were conducted on the same day and time separated by one week (note, Ruch, Stang, McKillip and Dye (2001) indicate that only 48 hours is needed to avoid practice effects). The administration of scales for each session was as follows: Session 1: Informed consent, demographics questions, VIEMS, ability scales, perceived performance, and fairness perceptions. Session 2: Feedback from day one, VIEMS, ability scales, perceived performance, and fairness perceptions. 4.3. Measures and materials Demographic information included age, gender, and year in school. Test-taking motivation was assessed with a ten-item measure known as the VIEMS developed by Sanchez et al. (2000). The VIEMS is based on a VIE (Vroom, 1964) model of motivation, assessing a person's (a) desire for the outcomes (i.e., Valence; 3 items, example item is “I want to get one of the cash prizes”); (b) perception of the relationship between performance on the test and receiving the outcome (i.e., Instrumentality; 4 items, example item is “How well a person does on this test will affect whether s/he wins one of the cash prizes.”);
352
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
and (c) perception of the relationship between effort and performance on the test (i.e., Expectancy; 3 items, example item is “If I try to do my best on this test, I can get a high score.”). Sanchez et al. report reliability coefficients for the three sub-scales ranging from 0.82 to 0.96. Cronbach's alphas for Time 1 (T1) and Time 2 (T2), respectively, for the current sample are 0.85 and 0.85 for the valence scale, 0.72 and 0.81 for the instrumentality scale, and 0.83 and 0.79 for the expectancy scale. Although all three scales of the VIEMS were administered, only the expectancy and valence scales are reported for the current study given that instrumentality was controlled (i.e., all participants were told the prize would be based strictly on test scores, thus instrumentality = 1.0 for all). Further, a check on the data revealed that valence and instrumentality were uncorrelated with both performance and fairness perceptions at day 1 (p N 0.10 for all). A key fairness perception, chance to perform, was assessed with a five-item measure developed by Bauer et al. (1998) assessing participants' perception of the degree to which the test was fair and allowed them the opportunity to demonstrate their skills. Cronbach's alpha for the current sample is 0.88 and 0.90 for T1 and T2, respectively. This dimension of fairness perceptions was chosen because it is highly applicable to current context (i.e., test takers are told the tests are designed to assess their intellectual abilities and skills). Further, extant research suggests that perceived measurement accuracy provides much of the driving force behind a variety of cognitive, affective, and behavioral reactions to testing (e.g., Crant & Bateman, 1990; Robertson, Iles, Gratton, & Sharpley, 1991; Rosse, Miller, & Stecher, 1994; Smither et al., 1993; Steiner & Gilliland, 1996). In particular, opportunity to demonstrate abilities has been noted as especially important with regard to perceived fairness of tests (Callinan & Robertson, 2000; Gilliland, 1993). Cognitive abilities and skills were assessed with six scales of the Employee Aptitude Survey (EAS). The EAS is a commonly used cognitive ability test assessing individuals' aptitudes in a variety of areas. The current study used the following scales: Verbal Comprehension, Numerical Ability, Space Visualization, Numerical Reasoning, Verbal Reasoning, and Symbolic Reasoning. The EAS has demonstrated reliability and both construct and criterion related validity (see Ruch et al., 2001 for details). Alternate-forms reliability coefficients for the different scales used in this study range from 0.82 to 0.91 (Ruch et al., 2001). Test–retest reliabilities are not reported by Ruch et al. (2001). In addition, two memory tests developed by the University of Pittsburgh and The American Institutes for Research (see Flanagan
et al., 1962 for details) were added to the battery to assess memory factors. We felt it was necessary to include manifest indicators of memory to provide a complete assessment of the major group factors of the cognitive ability domain. All tests were timed and designed to be speeded tests. The order of the EAS scales were varied across days, however, the memory scales were kept in their same positions due to their need to have consistent time intervals between part A (i.e., learning phase) and part B (recall phase). The first half of the two memory scales (the memorization phase) were completed first, followed by the EAS scales, and then the second half of the two memory scales (the recall and recognition phase) were completed last. For the analyses involving a total observed score, test performance was computed as the average of the eight standardized scale scores (within day) for each person. For analyses involving the g-variance and non-g variance components, two scores were computed using principal components analysis. Principal components analysis was used as it is suitable for our purpose (i.e., to divide the variances into two components) and effectively provides g-loadings that are essentially identical to other methods (Jensen & Weng, 1994). First, to derive g-variance component, we extracted the first unrotated principal component based on the set of eight scales. Then each of the original eight scales were residualized on the g-score variable, and combined to form a non-g composite. Performance feedback was operationalized as the percentage of items answered correctly out of all the items on the EAS and the memory tests combined. 5. Results Table 1 contains descriptive statistics and intercorrelations for all variables at both time periods. It is worth noting that the average T1 valence score was quite high (M = 4.34, SD = 0.68 on a 5-point scale). This suggests that participants were, on average, sufficiently motivated by the cash rewards. The correlations shown in Table 1 support our expectations regarding the significant relations between test performance and test reactions at Time 1. Similar static effects are seen at Time 2. Further, the cross-lagged correlations in Table 1 (e.g., T1 expectancy and T2 performance) are significant as well. To begin testing our central premise, we first replicated the findings of Sanchez et al. (2000) by testing for bivariate relations between Time 2 test reactions and Time 1 performance after controlling for
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
353
Table 1 Descriptive statistics and zero-order correlations among observed variables M
SD
1
2
3
Time 1 Expectancy Valence Fairness Performance
4.06 4.34 2.24 .00
0.61 0.68 0.73 0.61
0.29 a 0.13 0.19 b
− 0.06 0.08
0.22 b
Time 2 Expectancy Valence Fairness Performance
3.70 3.70 2.43 .00
0.61 0.59 0.72 0.63
0.58 a 0.59 a 0.28 a 0.19 b
0.18 b 0.45 a 0.07 0.12
0.21 b 0.24 a 0.71 a 0.22 b
4
5
6
7
0.34 a 0.42 a 0.30 a 0.90 a
0.80 a 0.41 a 0.34 a
0.47 a 0.44 a
0.35 a
Note = 136. Expectancy, valence, and fairness measured on a 5-point scale. Performance computed as the mean of the eight standardized (withinoccasion) ability scales. a p b 0.05. b p b 0.01.
Time 1 reactions. As shown in Table 2, after controlling for Time 1 reactions, each of the Time 2 reactions was significantly correlated with Time 1 performance. These results confirm Bauer et al.'s findings, and suggest that Time 1 performance has a unique influence on nonability factors even after controlling for test-takers' initial standing on those factors. To test our specific hypotheses, we first separated variance in test performance into a g-component and a non-g component as explained in the Methods section. These two components were then correlated with the non-ability variables. It should be noted that by using a simple principal component procedure rather than factor analysis, we are actually conducting a conservative test of our hypothesis because the non-g component contains all variance due to all non-g sources, not just Table 2 Relation between time 1 performance and time 2 perceptions controlling for time 1 perceptions DV
ΔR2
Partial r
Expectancy Expectancy
0.037 a 0.075 b
0.34b
Valence Valence
0.01 0.19b
0.48b
Fairness Fairness
0.05b 0.05b
0.30b
Order of entry
Performance (T1) Time 1 Time 2 Performance (T1) Time 1 Time 2 Performance (T1) Time 1 Time 2
Note: N = 136. Partial r reflects the partial correlation between Time 2 perception and Time 1 performance controlling for the same Time 1 perception. a p b 0.05. b p b 0.01.
the unique variance due to specific abilities. Thus, any correlations between the non-g component and nonability factors can be thought of as liberal estimates. That is, our hypothesis predicts null relations between the non-ability variables and the non-g variance component. Thus, by using a liberal estimate of the non-g variance, we increase the chances of disconfirming our hypothesis. Nonetheless, the results, shown in Table 3, clearly support Hypotheses 1 and 2. None of the six correlations between the non-g performance component and the non-ability variables are significant. In fact, none exceed a correlation of 0.10. Thus, once g is removed from the equation, motivation and fairness perceptions are unrelated to performance (i.e., the non-g performance variance). On the other hand, the results confirm that the non-ability factors are significantly related to the g component (except for Time 1 valence).
Table 3 Correlations between non-ability variables and ability components Variance Components g
Non-g
Time 1 Expectancy Valence Fairness
0.22 a 0.08 0.21a
− 0.09 0.04 0.05
Time 2 Expectancy Valence Fairness
0.35 b 0.44b 0.30b
− 0.02 0.00 0.06
Note: N = 136. a p b 0.05. b p b 0.01.
354
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
Table 4 Correlated vectors analysis
Table 5 Mean changes in study variables after performance feedback from prior session
Zero Order Correlations Time 1 λg Verb. Comprehension Visual-Spatial Numerical Reasoning Verbal Reasoning Symbolic Reasoning Numerical Ability Sentence Memory Word Memory Correlated vectors rg
Exp.
Val. Fair. 0.00 0.09
Exp. 0.12
Val.
Mean
Fair.
0.19 a 0.14
0.55
0.11
0.72 0.72
0.18a 0.02 0.11 0.22 b 0.30b 0.19a 0.15 0.05 0.17a 0.25b 0.22a 0.17a
0.72 0.74
0.18a 0.06 0.06 0.30b 0.39b 0.16 0.16 0.08 0.23b 0.32b 0.38b 0.25b
0.73 0.20a 0.07 0.18a 0.28b 0.35b 0.33b 0.37 − 0.02 0.05 0.12 0.03 0.11 0.09 0.22 − 0.02 0.06 0.09 0.10 0.12 0.15 0.96b 0.08 0.43
Change = T2 − T1
Variable
Time 2
0.87b 0.86b 0.59
Note: Zero-order correlations based on N = 136. Correlated vectors (rg) based on N = 8. λg = Principal component weights. Exp. = Expectancy; Val. = Valence; Fair. = Fairness. a p b 0.05. b p b 0.01.
Next we tested Hypothesis 3 using Jensen's (1998) correlated vectors technique. Specifically we computed the zero-order correlations between each of the eight ability sub-scales and the three non-ability variables. We also recorded the lambda-weight on the first unrotated principal component for each ability scale (shown in the first column in Table 4). Then we correlated each vector of correlations with the vector of lambda-weights. Results are shown in the bottom row of Table 4. The correlations are all strong (except for Time 1 Valence), indicating that the magnitude of the correlation between an ability scale and motivation or fairness perceptions is driven primarily by the degree to which the scale's variance is due to g. These results support hypothesis 3 and further confirm that g is the common component linking test performance and test reactions. Prior to testing the specific hypotheses regarding predictors of change, we examined whether receiving feedback and gaining experience with the test influenced test-taking motivation, fairness perceptions and test performance. To do so, we examined the mean within-person change between time 1 and 2 for each variable (i.e., computed a difference score for each person for each variable). Results are shown in Table 5. Because participants were given performance feedback in the form of percent of total items answered correctly, changes in performance in Table 5 reflect this computation. As would be expected with repeated exposure to the ability test, performance increased on
SD
− 0.36 − 0.64a 0.18a 0.08a
a
Expectancy Valence Fairness % items correct
0.56 0.67 0.56 0.67
Note: N = 136. Expectancy, Valence, and Fairness measured on a 5point scale. a p b 0.01.
average. Interestingly, participants' perceptions of whether the test allowed them the opportunity to demonstrate their skills increased, but their ratings of outcome valence and expectancy decreased significantly. Hypothesis 4 was tested by hierarchical regression, with the T2 motivation and fairness scales as the dependent variables. The corresponding T1 scales were entered into each equation first, as a control variable. Thus, the dependent variables are scores at T2 controlling for variance at T1. This procedure provides for an assessment of the effects of other variables entered after the control variable on changes in the dependent variable from T1 to T2 (Kessler & Greenberg, 1981). The results regarding Hypothesis 4 are shown in Table 6. As expected, T1 performance significantly predicts Time 2 expectancy, valence and fairness perceptions after controlling for initial levels of those variables (i.e., performance at Time 1 predicts changes in the nonability variables). In particular, it is the g-variance component from Time 1 performance that predicts changes in these non-ability variables. In all three cases, the Time 1 g-variance component significantly increments the variance accounted for in the Time 2 nonability variables. Thus, Hypothesis 4 is supported. Table 6 g and non-g components as predictors of changes in non-ability variables (Hypothesis 4) Order of Entry
Step 1 T1 Variable Step 2 T1 g-variance T1 non-g variance
Expectancy
Valence
β
β
ΔR
2
0.34
ΔR
a
a
ΔR2 0.50a
0.20
a
0.45 0.05a
0.24 0.02
β a
a
0.58
a
Fairness 2
0.71 0.16a
a
0.40 − 0.02
0.02a a
0.16 0.02
Note: N = 136. T1 Variable = Corresponding non-ability variable at Time 1. a p b 0.01.
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358 Table 7 Non-ability variables as predictors of changes in observed performance (Hypothesis 5) Order of Entry
Total % Correct β
Step 1 T1 Total % Correct Step 2 T1 Expectancy T1 Valence T1 Fairness
ΔR2 0.85 0.92 a 0.01
− 0.03 0.05 0.05
Note: N = 136. T1 = Time 1. a p b 0.01.
Hypothesis 5 was tested by the same procedure, with T2 performance (operationalized as total percent correct) as the dependent variable. T1 performance was entered first, as a control variable. Thus, the dependent variable is T2 performance controlling for variance in T1 performance. The non-ability variables were entered next as a block, but failed to account for changes in performance (the overall increment in variance accounted was not significant, nor were any of the three betas). The non-ability variables inability to account for changes in performance confirms Hypothesis 5 (Table 7). 6. Discussion The self-serving bias has been shown to be a robust and pervasive phenomenon in human cognition (Mezulis, Abramson, Hyde, & Hankin, 2004). Applied to the test-taking context, it suggests that reactions to a cognitive ability test (e.g., fairness perceptions) are partly a function of the person's actual ability; poorperforming individuals can reduce threat to the ego by developing negative perceptions of the test. For example, according to the self-serving bias hypothesis, lower ability individuals should protect their ego by evaluating the use of a cognitive ability test for selection as relatively unfair, an inaccurate assessment of their ability, and report low motivation for taking the test. On the other hand, high ability test-takers should engage an ego-enhancing mechanism, evaluating the use of a cognitive ability test as more fair, providing an accurate assessment of their ability, and report having been motivated to do well. Overall, our results provide evidence consistent with this theory. Importantly, our results provide the much needed empirical evidence demonstrating the applicability of the self-serving bias hypothesis with respect to the V.I.E. based model of test-taker motivation. Although Chan's
355
discussion of the self-serving bias hypothesis explicitly discusses test-taker motivations, most prior research had tended to focus almost exclusively on the relation among justice perceptions and performance. This left a critical gap in the empirical foundation of the selfserving bias hypothesis. Second, prior research had not yet provided evidence regarding the possibility that the relation between test-taker reactions (i.e., fairness perceptions and motivations) and test-performance is a spurious correlation due to a common antecedent (namely, general reasoning ability). Although no single study can provide a conclusive answer to this question, our results suggest this may be the case. Our results show that although expectancy, outcome valence, fairness perceptions, and test performance are all positively related, much of this covariance appears to be due to general cognitive ability. Specifically, we found that expectancy, valence and fairness are uncorrelated with performance once the gvariance has been partialled out. That is, these nonability variables show significant zero-order correlations with performance, but once the g-variance was partialled out of the performance variable, the correlations disappeared. Further, to confirm that the relation between observed performance and the non-ability factors is due to g, we used Jensen's (1998) correlated vectors technique. Our results showed strong positive correlations between the degree of g-saturation of a scale and its relation to the non-ability variables (except for Time 1 Valence). These results indicate that the observed relation between performance on ability scales and motivation or fairness perceptions is driven primarily by shared g-variance. In addition, we showed that the g-variance underlying Time 1 performance predicts changes in both motivation and fairness perceptions from pre- to posttest assessments. That is, after controlling for previous levels of expectancy, valence, and fairness perceptions, differences in g accounted for significant amounts of variance in subsequent reports of expectancy, valence, and fairness. In contrast, test-taking motivations and fairness perceptions were not related to performance changes from Time 1 to Time 2. 6.1. Implications of findings The pattern of results, taken as a whole, would appear to support the conclusion that g is the key explanatory variable that drives most of the observed zero-order correlations. These findings have important implications for the understanding of the construct validity of cognitive ability tests and their use in high-stakes
356
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
testing situations. If non-ability factors such as motivation and applicant perceptions significantly impact performance on cognitive ability tests independent of the applicants' actual ability, our understanding of the sources of latent variance that account for performance on tests would need to be modified. However, the notion that performance on such tests reflects substantially more than the posited underlying abilities does not seem to be warranted. Our results are consistent with recent findings (e.g., Schmitt, 2002) which suggest that performance differences on ability tests are largely reflective of “true” ability differences among applicants. Findings such as these indicate the extent to which attitudes and motivation correlate with performance appears of less concern to users of test results than perhaps initially thought. With that being said, our results must be interpreted with caution; we are not advocating the abandonment of research on test-taker reactions. On the contrary, within the context of employment selection, Anderson's (Anderson, 2001; Anderson & Ostroff, 1997) theory of pre-entry socialization impact postulates that selection tests act as “interventive affectors” that can influence applicant reactions, which can in turn impact organizationally relevant outcomes such as the likelihood to pursue employment (e.g., Reeve & Schultz, 2004; Schmit & Ryan, 1997; Smither et al, 1993), the likelihood of recommending an organization to potential applicants (Bauer et al., 2001), and can contribute to the image of the organization as a whole (Bauer et al., 1998; Macan, Avedon, Paese, & Smith, 1994; Reeve & Schultz, 2004). Likewise, although the current results suggest that motivation and test perceptions are meager influences on test performance in the “maximal performance” situations that are high-stakes testing, it is still important to recognize that motivation and fairness perceptions are clearly relevant psychological variables in general, and may be especially useful in predicting typical performance behaviors. 6.2. Generalizability of findings Caution is warranted in generalizing the findings to substantively distinct populations or similar populations taking substantively different tests. For example, our study was conducted in a laboratory setting with undergraduate students taking cognitive ability tests. One might considered this a poor proxy for other highstakes testing situations. However, to the extent that our sample may not have viewed this as a “maximal performance” situation would suggest that the variance in motivation and attitudes should have been enhanced
compared to other high-stakes testing situations where, arguably, variance in motivation would be minimized. Similarly, one may question if students are similar to typical job applicant samples. For example, it is possible that college students have more self-knowledge of their cognitive ability than a typical applicant population. On the other hand, when one considers test-taker reactions in the context of other high-stakes testing such as SATs, GREs, GMATs, etc., our sample can be considered to be much more representative. Also, it is unclear if our findings extend to other maximal performance measures of other constructs. It should also be noted that the current study did not assess the criterion-related validity of test scores. Thus, we cannot rule out the possibility that motivation and attitudes do not impact the usefulness of test scores. However, given the robust evidence for the predictive validity of ability tests, and that our findings suggest motivation and attitudes do not appreciably alter the rank order of test scores (see also Schmitt, 2002), it would seem unlikely that the criterion-related validity of the test scores would have changed appreciably. Nonetheless, future research should address this possibility explicitly. Finally, due to practical constraints, the interval between panels in our design was necessarily brief compared to typical minimum wait periods for retesting. Clearly retesting leads to practice effects, which lead to a host of additional important questions (Reeve & Lam, 2005, 2004). However, these questions should not be confused with the focal issue of this paper; the purpose of this study was to examine the unique impact of nonability factors on ability test scores above and beyond general ability. Retesting was a necessary aspect of the design to allow us to disentangle the effects of ability and non-ability factors. Additionally, our interval was consistent with and even longer than some reported in prior analyses examining practice effects (e.g., Jensen, 1998, chapter 10). Likewise, the choice to use the same test battery on both occasions is consistent with the reality that, given the widespread use of commercially available ability tests (e.g., the EAS which was used in this study), and that many employers allow applicants to retake selection tests multiple times (e.g., Wheeler, 2004), some job applicants are likely to encounter the same test multiple times. References Anderson, N. (2001). Towards a theory of socialization impact: Selection as pre-entry socialization. International Journal of Selection and Assessment, 9, 84−91.
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358 Anderson, N., & Ostroff, C. (1997). Selection as socialization. In N. Anderson, & P. Herriot (Eds.), International handbook of selection and assessment (pp. 413−440). New York: John Wiley and Sons. Arvey, R. D., Strickland, W., Drauden, G., & Martin, C. (1990). Motivational components of test taking. Personnel Psychology, 43, 695−716. Barber, A. E. (1998). Recruiting employees: Individual and organizational perspectives. Thousand Oaks, CA: Sage Publications. Bauer, T. N., Maertz, C. P., Jr., Dolen, M. R., & Campion, M. A. (1998). Longitudinal assessment of applicant reactions to employment testing and test outcome feedback. Journal of Applied Psychology, 83, 892−903. Bauer, T. N., Truxillo, D. M., Sanchez, R. J., Craig, J. M., Ferrara, P., & Campion, M. A. (2001). Application reactions to selection: Development of the selection procedural justice scale (SPJS). Personnel Psychology, 54, 387−419. Callinan, M., & Robertson, I. T. (2000). Work sample testing. International Journal of Selection and Assessment, 8, 248−260. Carroll, J. B. (1993). Human cognitive abilities: A survey of factoranalytic studies. New York: Cambridge University Press. Chan, D. (1997). Racial subgroup differences in predictive validity perceptions on personality and cognitive ability tests. Journal of Applied Psychology, 82, 311−320. Chan, D., Schmitt, N., DeShon, R. P., Clause, C. S., & Delbridge, K. (1997). Reactions to cognitive ability tests: The relationships between race, test performance, face validity perceptions, and testtaking motivation. Journal of Applied Psychology, 82, 300−310. Chan, D., Schmitt, N., Jennings, D., Clause, C. S., & Delbridge, K. (1998). Applicant perceptions of test fairness integrating justice and self-serving bias perspectives. International Journal of Selection and Assessment, 6, 232−239. Chan, D., Schmitt, N., Sacco, J. M., & DeShon, R. P. (1998). Understanding pretest and posttest reactions to cognitive ability and personality tests. Journal of Applied Psychology, 83, 471−485. Crant, J., & Bateman, T. S. (1990). An experimental test of the impact of drug-testing programs on potential job applicants' attitudes and intentions. Journal of Applied Psychology, 75, 127−131. Dolan, C. V., Roorda, W., & Wicherts, J. M. (2004). Two failures of Spearman's hypothesis: The GATB in Holland and the JAT in South Africa. Intelligence, 32, 155−173. Flanagan, J. C., Dailey, J. T., Shaycoft, M. F., Gorham, W. A., Orr, D. B., & Goldberg, I. (1962). Design for a study of American youth. Boston: Houghton Mifflin. Gilliland, S. W. (1993). The perceived fairness of selection systems: An organizational justice perspective. Academy of Management Review, 18, 694−734. Gilliland, S. W. (1994). Effects of procedural and distributive justice on reactions to a selection system. Journal of Applied Psychology, 79, 691−701. Gordon, R. A. (1997). Everyday life as an intelligence test: Effects of intelligence and intelligence context. Intelligence, 24, 203−320. Gottfredson, L. S. (1997). Why g matters: The complexity of everyday life. Intelligence, 24, 79−132. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley. Heider, F. (1976). A conversation with Fritz Heider. In J. H. Harvey, W. J. Ickes, & R. F. Kidd (Eds.), New directions in attribution research, Vol. 1 (pp. 47−61). Hillsdale, NJ: Erlbaum. Irvine, S. H., & Berry, J. W. (Eds.). (1988). Human abilities in cultural context Cambridge, England: Cambridge University Press.
357
Jensen, A. R. (1980). Bias in Mental Testing. New York, NY: Free Press. Jensen, A. R. (1985). The nature of the black-white difference on various psychometric tests: Spearman's hypothesis. Behavioral and Brain Sciences, 8, 193−219. Jensen, A. R. (1987). Further evidence for Spearman's hypothesis concerning the black-white differences on psychometric tests. Behavioral and Brain Sciences, 10, 512−519. Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger. Jensen, A. R., & Reynolds, C. R. (1982). Race, social class and ability patterns on the WISC-R. Personality and Individual Differences, 3, 423−438. Jensen, A. R., & Weng, A. R. (1994). What is a good g? Intelligence, 18, 231−258. Kessler, R. C., & Greenberg, D. F. (1981). Linear panel analysis: Models of quantitative change. Sand Diego, CA: Academic Press. Lubinski, D., & Humphreys, L. G. (1997). Incorporating general intelligence into epidemiology and the social sciences. Intelligence, 24, 159−201. Macan, T. H., Avedon, M. J., Paese, M., & Smith, D. E. (1994). The effects of applicants' reactions to cognitive ability tests and an assessment center. Personnel Psychology, 47, 715−738. McCarthy, J. M., & Goffin, R. D. (2003). Is the Test Attitude Survey psychometrically sound? Educational and Psychological Measurement, 63, 446−464. Mezulis, A. H., Abramson, L. Y., Hyde, J. S., & Hankin, B. L. (2004). Is there a universal positivity bias in attributions? A meta-analytic review of individual, developmental, and cultural differences in the self-serving attributional bias. Psychological Bulletin, 130, 711−747. Miller, D. T., & Ross, M. (1975). Self-serving biases in the attribution of causality: Fact or fiction? Psychological Bulletin, 82, 213−225. Nijenhuis, J., & van der Flier, H. (2003). Immigrant-majority group differences in cognitive performance: Jensen effects, cultural effects, or both? Intelligence, 31, 443−459. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice Hall. Peoples, C. E., Fagan, J. F., & Drotar, D. (1995). The influence of race on 3-year-old children's performance on the Stanford-Binet: Fourth Edition. Intelligence, 21, 69−82. Reeve, C. L. (2004). Differential ability antecedents of general and specific dimensions of declarative knowledge: More than g. Intelligence, 32, 621−652. Reeve, C. L., & Hakel, M. D. (2002). Asking the right questions about g. Human Performance, 15, 47−74. Reeve, C. L., & Lam, H. (2004). The relation between practice effects, scale properties, and test-taker characteristics. Paper presented at the 19th Annual conference of the Society for Industrial and Organizational Psychology, Chicago, IL. Reeve, C. L., & Lam, H. (2005). The psychometric paradox of practice effects due to retesting: Measurement invariance and stable ability estimates in the face of observed score changes. Intelligence, 33, 535−549. Reeve, C. L., & Schultz, L. (2004). Job seeker reactions to selection process information in job ads. International Journal of Selection and Assessment, 12, 331−343. Robertson, I. T., Iles, P. A., Gratton, L., & Sharpley, D. (1991). The impact of personnel selection and assessment on candidates. Human Relations, 44, 963−982.
358
C.L. Reeve, H. Lam / Intelligence 35 (2007) 347–358
Rosse, J. G., Miller, J. L., & Stecher, M. D. (1994). A field study of job applicants' reactions to personality and cognitive ability testing. Journal of Applied Psychology, 79, 987−992. Ruch, W. W., Stang, S. W., McKillip, R. H., & Dye, D. A. (2001). Employee Aptitude Survey Technical Manual, 2nd Ed. Glendale, CA: Psychological Services Inc. Rushton, J. P. (1998). The “Jensen Effect” and the “Spearman-Jensen hypothesis” of Black-White IQ differences. Intelligence, 26, 217−225. Rushton, J. P. (2001). Black-White differences on the g-factor in South Africa: A “Jensen Effect” on the Wechsler Intelligence Scale for Children-Revised. Personality and Individual Differences, 31, 1227−1232. Rushton, J. P., & Jensen, A. R. (2003). African-White IQ differences from Zimbabwe on the Wechsler Intelligence Scale for ChildrenRevised are mainly on the g factor. Personality and Individual Differences, 34, 177−183. Ryan, A. M., & Ployhart, R. E. (2000). Applicants' perceptions of selection procedures and decisions: A critical review and agenda for the future. Journal of Management, 26, 565−606. Sanchez, R. J., Truxillo, D. M., & Bauer, T. N. (2000). Development and examination of an expectancy-based measure of test-taking motivation. Journal of Applied Psychology, 85, 739−750. Schmit, M. J., & Ryan, A. M. (1992). Test-taking dispositions: A missing link? Journal of Applied Psychology, 77, 629−637.
Schmit, M. J., & Ryan, A. M. (1997). Applicant withdrawal: The role of test-taking attitudes and racial differences. Personnel Psychology, 50, 855−876. Schmitt, N. (2002). Do reactions to tests produce changes in the construct measured? Multivariate Behavioral Research, 37, 105−126. Smither, J. W., Reilly, R. R., Millsap, R. E., Pearlman, K., & Stoffey, R. W. (1993). Applicant reactions to selection procedures. Personnel Psychology, 46, 49−76. Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 201−293. Steiner, D. D., & Gilliland, S. W. (1996). Fairness reactions to personnel selection techniques in France and the United States. Journal of Applied Psychology, 81, 134−141. Vroom, V. H. (1964). Work and motivation. New York: Wiley. Wheeler, J. K. (2004). Practical implications of selection retests on testing development and policy. Practitioner forum presented at the 19th Annual conference of the Society for Industrial and Organizational Psychology, Chicago, IL. Whitley, B. E., Jr., & Frieze, I. H. (1985). Children's causal attributions for success and failure in achievement settings: A meta-analysis. Journal of Educational Psychology, 77, 608−616.