Intelligence 42 (2014) 115–127
Contents lists available at ScienceDirect
Intelligence
Comparing different explanations of the effect of test anxiety on respondents' test scores Markus Sommer, Martin E. Arendasy ⁎ University of Graz, Austria
a r t i c l e
i n f o
Article history: Received 5 September 2013 Received in revised form 1 November 2013 Accepted 4 November 2013 Available online 14 December 2013 Keywords: State- and trait test anxiety Cognitive ability Interference model Measurement bias Automatic item generation
a b s t r a c t Based on meta-analytic findings of a moderate negative correlation between test anxiety and test performance some researchers hypothesized that trait and/or state test anxiety may induce measurement bias. Two competing models have been advanced to account for the observed test anxiety-test performance relationship: the deficit hypothesis and the interference hypothesis. The interference hypothesis predicts that trait- and/or state test anxiety induces measurement bias. This effect has been hypothesized to be the most pronounced in items of intermediate difficulty. The deficit hypothesis, on the other hand, claims that test anxiety and test performance are correlated because less competent test-takers experience higher levels of state test anxiety in the assessment process. However, test anxiety is not assumed to have a causal effect on test performance. We tested these competing claims by means of item response theory and structural equation modeling. A total of N = 411 respondents first completed a measure of trait test anxiety. Afterwards respondents were administered four cognitive ability tests. Upon completing the instruction and the first three items of each test respondents filled a pre-test state test anxiety questionnaire. The same state test anxiety questionnaire was also administered after all items of a subtest had been completed. In line with the deficit hypothesis the results indicated measurement invariance across different levels of state- and trait test anxiety. Furthermore, structural equation modeling revealed that that state/trait test anxiety is most closely related to psychometric g. Most interestingly state test anxiety components specific to the post-test measurement occasion were also related to cognitive ability while state test anxiety components specific to the pre-test measurement occasion were not systematically related to cognitive ability. The present finding is therefore most consistent with a deficit account to the test anxiety-test performance relationship. © 2013 Elsevier Inc. All rights reserved.
1. Introduction Due to the increased use of cognitive ability tests interest in research on test fairness resurged. In general, test fairness is compromised, if construct-irrelevant factors induce measurement bias and therefore lead to incorrect ability estimation. In the presence of measurement bias, respondents with the same
⁎ Corresponding author at: Psychological Methods and Assessment Group, Department of Psychology, University of Graz, Universitätsplatz 2, 8010 Graz, Austria. E-mail address:
[email protected] (M.E. Arendasy). 0160-2896/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.intell.2013.11.003
standing on the latent trait, who differ in construct-irrelevant factors (e.g. state- and/or trait test anxiety), do not have identical expected item- and/or test scores (e.g. Drasgow, 1987; Millsap, 1997; Mislevy et al., 2013; Rajo, Laffine, & Byrne, 2002). In consequence, differences in test performance within- and between these groups are not attributable to the same latent trait because test scores reflect individual differences in the latent ability trait(s) of interest and individual differences in construct-irrelevant variance factors (Lubke, Dolan, Kelderman, & Mellenbergh, 2003). Several authors hypothesized that test anxiety may induce measurement bias (e.g. Haladyna & Downing, 2004; Hembree, 1988). This hypothesis has been based
116
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
on meta-analytic findings of a moderate negative (meta-analytic mean r = −.23 to −.33) correlation between test anxiety and test performance (Ackermann & Heggestad, 1997; Hembree, 1988). Despite the practical relevance of this topic only few studies directly evaluated, whether test anxiety induces measurement bias and results have been mixed thus far (Halpin, da-Silvva, & De Boeck, in press; Reeve & Bonaccio, 2008). The inconsistent findings might be due to differences in research design characteristics and the psychometric methods used to test measurement bias. 1.1. Definition of test anxiety Test anxiety refers to the situation-specific anxiety experienced in evaluative situations (Putwain, 2008; Zeidner, 1998). Researchers have found it useful to differentiate between stateand trait test anxiety, and between the different components of test anxiety. 1.1.1. Components of test anxiety Factor analytic research (e.g. Benson & Bandalos, 1992; Englert, Bertrams, & Dickhäuser, 2011; Hodapp & Benson, 1997; Keith, Hodapp, Schermelleh-Engel, & Moosbrugger, 2003; Sarason, 1984; Wacker, Jaunzeme, & Jaksztat, 2008) revealed that test anxiety consists of cognitive components (worry and test-irrelevant thinking) and affective components (emotionality and bodily symptoms). The cognitive component worry refers to concerns about the outcome and consequences of an assessment and is characterized by distorting negative thoughts (cf. Putwain, Connors, & Symes, 2010). Task-irrelevant thinking, on the other hand, denotes interfering thoughts unrelated to the content and outcome of the assessment and has been linked to avoidance coping (Schutz, Di Stefano, Benson, & Davis, 2004). The affective component, on the other hand, comprises physiological reactions (bodily symptoms) and the feeling of being nervous and tense (emotionality). Research indicated that the cognitive- and affective components of test anxiety differ in their relation to test performance. One meta-analysis (Hembree, 1988) and several independent studies (e.g. Cassady & Johnson, 2002; Hong, 1998; Hong & Karstensson, 2001; McCarthy & Goffin, 2005; Meijer & Oostdam, 2011; Oostdam & Meijer, 2003) showed that the cognitive components were more strongly correlated with test performance than the affective components. Furthermore, the correlation between the affective component and test performance decreased after controlling for the cognitive components, while the correlation coefficient between the cognitive components and test performance remained essentially unchanged after controlling for the affective component (Hembree, 1988). Thus, the cognitive components of test anxiety drive the test anxiety-test performance relationship. 1.1.2. Trait- versus state test anxiety Researchers usually also make a distinction between stateand trait test anxiety. In general, trait test anxiety refers to the proneness to experience anxiety in different kinds of assessment situations, while state test anxiety denotes a fluctuating emotional state experienced in a particular assessment situation (Spielberger & Vagg, 1995; Zeidner, 1998). Measures of trait test anxiety have been shown to be stable across time-point of
measurement and comprised little variance attributable to the situation-specific factors (Hong, 1998; Keith et al., 2003). By contrast, state anxiety questionnaires turned out to be more variable and were affected by characteristics of the assessment situation (e.g. Hong, 1998; Meijer & Oostdam, 2011). The distinction between state- and trait test anxiety is important because state- and trait test anxiety have been shown to be clearly separable (e.g. Hong, 1998, 1999; Hong & Karstensson, 2001; Meijer & Oostdam, 2011; Paulman & Kennelly, 1984) and most theoretical explanations of the test anxiety-test performance relationship focus on state test anxiety (for an overview: Zeidner, 1998). 1.2. Factors influencing individual differences in state test anxiety State test anxiety constitutes the result of a cognitive appraisal process which is influenced by several individual and situational factors (Davis, DiStefano, & Schutz, 2008; Schutz et al., 2004). Some of these factors have been hypothesized to be more general, while others were hypothesized to be more specific to the individual tests administered. For instance, goal relevance and goal congruence constitute more general factors that have been shown to affect respondents' level of state test anxiety (cf. Nie, Lau, & Liau, 2011; Reeve, Bonaccio, & Charles, 2008; Schutz et al., 2004). A similar argument can be made regarding achievement avoidance (e.g. Pekrun, Elliot, & Maier, 2009; Putwain & Symes, 2012), trait test anxiety (e.g. Hong, 1998; 1999; Hong & Karstensson, 2001; Meijer & Oostdam, 2011; Paulman & Kennelly, 1984) and psychometric g (Goetz, Preckel, Pekrun, & Hall, 2007), which have also been shown to affect respondents' level of state test anxiety. Other factors influencing respondents' cognitive appraisal of the test situation are more specific to the cognitive ability domain assessed. For instance, testing problem efficiency (e.g. Davis et al., 2008; Lang & Lang, 2010; Nie et al., 2011), which is defined as the judgement respondents make on their ability to manage problems arising during test-taking, is likely to be more specific to the cognitive ability domain measured. Thus, individual differences in state test anxiety experienced throughout an admission test can be decomposed into variance components specific to the individual subtests and variance components that are more general in nature. 1.3. Factors influencing the test anxiety-test performance relationship Research also indicated that the size of the correlation coefficient between test anxiety and test performance depends on several situational factors and test characteristics (for an overview: Hembree, 1988; Zeidner, 1998). 1.3.1. Effect of test characteristics on the test anxiety-test performance relation One meta-analysis (Hembree, 1988) and several independent studies (e.g. Chen, 2012; Hong, 1999; Kim & Rocklin, 1994) indicated that test anxiety is more closely linked to test performance for more difficult tests (meta-analytic mean r = − .45) than for easier tests (meta-analytic mean r = −.07). Furthermore, the cognitive ability domain measured has also been shown to affect the test anxiety-test
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
performance relationship. A meta-analysis conducted by Ackermann and Heggestad (1997) revealed that more g-saturated cognitive ability domains were more strongly correlated with test anxiety than less g-saturated cognitive ability domains. This finding is consistent with results indicating that (test) anxiety appears to be correlated primarily with individual differences in psychometric g (Reeve & Bonaccio, 2008). 1.3.2. Effect of situational factors on the test anxiety-test performance relation The test anxiety-test performance relationship has also been shown to depend on the time-point at which test anxiety has been measured (cf. Klinger, 1984; Stohbeck-Kühner, 1999; Zeidner, 1991). Measuring test anxiety upon completing the entire admission test yielded higher test anxiety-test performance correlation coefficients than measuring test anxiety prior to completing the test items. A possible explanation for this finding is that state anxiety experienced during testtaking prime emotion-congruent memories which affects respondents' answers to the post-test test anxiety questionnaire (c.f. Zeidner, 1995, 1998). Alternatively, less competent respondents may simply elevate their perceived level of state test anxiety in the post-test questionnaire to maintain their self-worth (cf. Smith, Snyder, & Handelsman, 1982). Note that both explanations attribute the observed increase in the test anxiety-test performance relationship to variance specific to the post-test test anxiety.
117
test anxious respondents differ in the type of thoughts to which they attend to during test-taking. While less test anxious respondents focus on the task at hand, more test anxious respondents also engage in worrisome cognitions and test-irrelevant thoughts. The main difference between the various interference models resides in the kind of test items hypothesized to be the most prone to measurement bias. Wine (1971) and Sarason (1984) hypothesized, that the worrisome cognitions and task-irrelevant thoughts shift the focus of attention from the test item to the self, which leads to a decrease in test performance as items become more complex. Under this model all items of intermediate difficulty should be vulnerable to measurement bias. The processing efficiency theory (Eysenck & Calvo, 1992) hypothesized, that worry and task-irrelevant thinking consumes storage and processing resources of the central executive, which leads to a reduction in processing efficiency. As task becomes more difficult, the reduced processing efficiency translates into a decrease in test performance. The processing efficiency model therefore predicts measurement bias is confined to items of intermediate difficulty that require storage and processing resources to be solved. Eysenck and associates (Eysenck & Derakshan, 2011; Eysenck, Derakshan, Santos, & Calvo, 2007) later on revised their model and postulated that worry and task-irrelevant thinking mainly affects inhibition and shifting functions. Thus, under this model measurement bias due to test anxiety should be confined to items of intermediate difficulty requiring inhibition and shifting functions to be solved.
2. Explaining the test anxiety-test performance relation 3. Methods used to test predictions from competing models Two competing models have been advanced to explain the test anxiety-test performance relationship: the deficit hypothesis, and the interference hypothesis (Hembree, 1988; Reeve & Bonaccio, 2008; Wicherts & Scholten, 2010; Zeidner, 1998). 2.1. Deficit hypothesis Under the deficit hypothesis respondents become increasingly aware of their deficits during test-taking and thus report higher levels of state/trait test anxiety (e.g. Klinger, 1984; Paulman & Kennelly, 1984; Stohbeck-Kühner, 1999; Tobias, 1985; Zeidner, 1991, 1998). Thus, test anxiety is hypothesized to constitute an effect of respondents' deficits and has no causal effect on test performance. Therefore, test anxiety should not induce measurement bias in cognitive ability tests. Furthermore, since respondents become more and more aware of their deficits the deficit model also predicts that post-test test anxiety measures should be more strongly correlated with test performance than pre-test test anxiety measures. 2.2. Interference hypothesis The interference hypothesis posits that test anxiety prevents test anxious respondents from performing at their true level of ability. This implies that test anxiety has a causal effect on test performance and should therefore induce measurement bias (cf. Halpin et al., in press; Reeve & Bonaccio, 2008; Wicherts & Scholten, 2010). The main idea is that more and less
The main difference between the deficit- and the interference hypotheses is whether measurement bias due to test anxiety is assumed to exist. In general, measurement bias occurs, when an item has different measurement properties across groups, irrespective of mean differences in the construct of interest (cf. Drasgow, 1987; Millsap, 1997; Mislevy et al., 2013; Rajo et al., 2002). Methods used to examine measurement bias either approximate the latent trait of interest (here: cognitive ability) by observed sum scores, or a latent variable. Similarly, the construct-irrelevant factor hypothesized to induce measurement bias may either represent an observed variable, or a latent variable. 3.1. Using item response theory to examine measurement bias In item response theory observed sum scores are used to approximate the construct of interest on which respondents differing in test anxiety are matched. Furthermore, the construct-irrelevant factor assumed to induce measurement bias also represents an observed categorical variable. Measurement bias is evaluated by testing the hypothesis that the item parameters are invariant across groups of respondents differing in test anxiety. This can be done using the Likelihood Ratio Test (LRT: Andersen, 1973). If the Likelihood Ratio statistic fails to reach significance, measurement invariance can be assumed. Otherwise, measurement bias exists and the researcher needs to examine the item fit statistics (e.g. Wald statistic) to identify the biased items. Using item response theory models such as the 1PL Rasch model (Rasch, 1980) requires that the test items
118
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
are unidimensional. This approach is therefore restricted to analyzing measurement bias at the level of the subtests. The benefit of item response theory analyses is that item response theory models can be extended to incorporate item design features hypothesized to be linked to cognitive component processes needed to solve the test items. For instance, the Linear Logistic test model (LLTM: Fischer, 1995) decomposes the 1PL item difficulty parameters into a weighted sum (qik) of basic parameter estimates (ηk) and a normalization constant (q0). In this model the weights represent the load of a certain cognitive component process and the basic parameter estimate denotes its contribution to the item parameter. If this model fits the data no worse than the 1PL Rasch model, researchers can examine, whether the basic parameters remain invariant across respondents differing in test anxiety, and whether measurement bias can be explained in terms of differences in cognitive component processes (for a similar idea: Zeidner, 1998). 3.2. Using structural equation modeling to examine measurement bias Structural equation models for evaluating measurement bias differ on whether the construct-irrelevant factor hypothesized to induce measurement bias constitutes an observed variable, or a latent trait. In multiple-indicator multiple-cause models (MIMIC: Muthén, 1989; Woods, 2009) the construct-irrelevant factor constitutes an observed variable, which can be either categorical or continuous. The construct-irrelevant factor is commonly referred to as causal indicator. In MIMIC models measurement bias can be examined by comparing the fit of the No-Bias model to the fit of a General Bias model. In the No-Bias model the latent trait(s) of interest are regressed on the causal indicator. This represents the hypothesis that test performance and individual differences in the causal indicator are correlated. The General Bias model extents the No-Bias model by also allowing the items to be regressed on the causal indicator. For identification purpose at least one item has to be selected to be free of measurement bias. The fit of the General Bias model and the No-Bias model can be examined using the following cut-off values for the goodness of fit indices: non-significant χ2-test, CFI ≥ .95, and RMSEA ≤ .06 (cf. Hu & Bentler, 1999; Marsh, Hau, & Wen, 2004). Furthermore, since the General Bias model is nested within the No-Bias model researchers can also evaluate, whether the No-Bias model fits the data significantly worse (significant Δχ2-statistic and ΔCFI N .01: Cheung & Rensvold, 2002) than the General Bias model. If so, measurement bias exists. Thus, the causal indicator predicts item responses even after controlling for mean differences in the latent trait(s) of interest. Halpin et al. (in press) recently outlined a structural equation model which allows both the cognitive ability trait(s) and the construct-irrelevant factor to be modeled as latent variables. Their model bears several similarities to the model outlined in Wicherts and Scholten (2010). In the No-Bias model the latent trait(s) of interest and the constructirrelevant trait hypothesized to induce measurement bias are assumed to be correlated. This model thus reflects the hypothesis that test anxiety and test performance are merely
related to each other (=deficit hypothesis). For ease of references the latent factor hypothesized to induce measurement bias will be called causal latent factor. In the General Bias model direct paths from the causal latent factor to the indicators of the latent trait(s) of interest are added to the model. This reflects the hypothesis that the causal latent factor induces measurement bias. Akin to MIMIC analyses the General Bias model specifying paths to all indicator variables is not identified. Unfortunately, simply setting one of these direct path to zero does not yield interpretable parameters because different configurations of model parameters exists, which yield identical global goodness of fit indices. Therefore, non-arbitrary restrictions on the interferences effects need to be imposed to yield interferences effect parameters and latent correlation between the latent trait(s) of interest and the causal latent factor (=deficit effect) that can be readily interpreted. This can be done by imposing equality constraints on selected interference parameters (Halpin et al., in press). Thus, testing measurement bias within this model boils down to comparing nested models. The (restricted) General Bias model has to be compared to the No-Bias model to determine, whether measurement bias exists. If the (restricted) General Bias model fits the data better than the No-Bias model measurement bias exists and researchers can examine the interference effect parameter estimates of the restricted General Bias models to examine, which items or item parcels exhibit measurement bias. 4. Formulation of the problem The competing explanations of the test anxiety-test performance relationship predict different patterns of findings with regard to the measurement invariance of cognitive ability tests. Despite the relevance of this topic research on the measurement invariance of cognitive ability tests across different levels of state/trait test anxiety has been sparse. The two studies conducted thus far yielded mixed evidences. While one of the two studies obtained evidence of measurement bias due to test anxiety (Halpin et al., in press), the other (Reeve & Bonaccio, 2008) one lends support to the deficit model. Unfortunately, comparing results across these two studies is complicated by the presence of differences in research design characteristics. For instance, the two studies differed in terms of the cognitive ability traits measures. While Reeve and Bonaccio (2008) administered a test battery consisting of measures of fluid intelligence, verbal comprehension, visualization, numerical ability and processing speed, Halpin et al. (in press) only assessed language skills and natural science knowledge. The two studies also differed in their measures of trait test anxiety and the specific structural equation model approaches used to evaluate measurement bias due to test anxiety. Reeve and Bonaccio (2008) modeled their test anxiety scale reflecting individual differences in trait worry as an observed variable in MIMIC analyses, Halpin et al. (in press) modeled trait task-irrelevant thinking as a latent variable in their measurement bias analyses. For these reasons it is hard to tell, whether the conflicting findings are attributable to research design differences, or reflect a failure to replicate prior studies. Furthermore, none of these studies evaluated, whether the results obtained for trait test anxiety also hold for state test anxiety measures. Therefore the goal of the present
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
study was to investigate the effect of state- and trait test anxiety on the measurement invariance of four cognitive ability tests. In addition, we also aimed to examine, whether the relation between test anxiety and test performance differs across cognitive ability domains and across differences in state test anxiety components due to more general- or more situation-specific factors. 4.1. Hypotheses regarding measurement invariance across state/ trait test anxiety Under the deficit model state/trait test anxiety has no causal effect on test performance and should therefore induce no measurement bias (Halpin et al., in press; Reeve & Bonaccio, 2008; Wicherts & Scholten, 2010). By contrast, the interference hypothesis predicts that individual differences in state and/or trait test anxiety induce measurement bias. The various versions of the interference hypothesis differ regarding the kind of items assumed to be affected the most. Thus measurement bias in the test items should be attributable to measurement bias in rather specific the cognitive component processes. 4.2. Hypothesis on the structure of state test anxiety and its relation to test performance Based on the theoretical considerations outlined above we hypothesized that individual differences in state test anxiety can be decomposed into occasion-unspecific factors and factors specific to the eight time-points of measurement. One means to test this hypothesis is to examine the fit of latent state-latent trait models (LST: Eid & Diener, 1999; Geiser & Lockhart, 2012). In case of categorical indicator variables the indicator-specific latent trait LST model has been recommended to separate variance due to occasion-unspecific factors from occasion specific variance components and measurement error. In this model an occasion-unspecific latent factor is specified, which loads on a specific indicator administered at all time-points of measurement. This indicator-specific occasion-unspecific latent factor therefore represents respondents' general tendency to endorse on of the response categories. The occasion-specific factors are orthogonal to the indicator-specific occasion-unspecific latent factor therefore and marked by the indicator variables administered at a certain time-point of measurement. In consequence, these factors thus reflect residual variance components in the indicator variables specific to the respective measurement occasion. As is common practice, the factor loadings are constraint to reflect the assumption of tau-equivalent measurement. Schmuckle and Egloff (2005) suggested extending this model by specifying a second-order factor that subsumes the indicator-specific occasion-unspecific latent factors. In the present application this second order factor can be interpreted as the occasion unspecific level of state test anxiety experienced throughout the assessment process. The resulting occasion-unspecific and occasion specific factors can be used in subsequent analyses to test several predictions made by the deficit- and interference hypothesis. For instance, these factors can serve as causal latent factors in measurement bias analyses and can also be used to evaluate competing hypotheses on the structural relation between these causal latent factors and the latent ability traits of
119
interest. Although both competing models predict that occasion-unspecific second-order state test anxiety factor is related to the cognitive ability trait factors, the deficit model further assumes that the occasion-specific factors of the post-test test anxiety measures are more strongly related to the cognitive ability trait factors than the occasion-specific factors of the pre-test test anxiety measures. This hypothesis can be tested by imposing equality constraints on the respective latent factor correlations and comparing the fit of this model to its less restrictive precursor. 5. Method 5.1. Sample The sample consisted of 171 (41.6%) male and 240 (58.4%) female psychology students aged 19 to 47 years (Mean: 22.38; SD: 3.485). 345 (83.9%) respondents had graduated from college but did not have a Bachelor degree in Psychology thus far. The remaining respondents already earned a Bachelor degree but did not complete the Masters' program in Psychology. Respondents were recruited via public announcements at the local university, social media and word of mouth. Due to the instructions used in our study (cf. Section 5.4.) care has been taken to only recruit Bachelor and Master degree students, who planned to complete the Masters' or PhD program at their current university. 5.2. Measures 5.2.1. Cognitive ability tests The cognitive ability test battery consisted of the following four subtests: numerical-inductive reasoning (NID), visualization (PC), word fluency (VF) and algebra word problem solving (AR). These subtests were selected on the basis of the following rationale: (1) The four cognitive ability tests cover a broader spectrum of stratum-two cognitive abilities (2) The 1PL Rasch model has shown to exhibit a good fit to the data in previous studies, indicating that all four tests are unidimensional. (3) The item design features have already been shown to account for the 1PL item parameters in previous LLTM analyses. Table 1 contains brief descriptions of the four cognitive tests, including item design features and the cognitive component processes hypothesized to be affected by these item design features. The item order within each subtest was determined as follows: The first three items were constituted an easier, intermediately difficult and more difficult item, respectively. The order of the remaining items of each subtest was randomized and the resulting item order was used for all respondents. These first three items were selected to be representative for the remaining items of the subtest based on their item parameters estimated in previous studies. Post-hoc analyses indicated that the item difficulties of the first three items did not differ from the remaining test items (NID: T[14] = − .133, p = .896; PC: T[16] = − .241, p = .813; VF: T[16] = .259, p = .799; AR: T[10] = − .944, p = .368). Thus, differences between pre- and post-test state test
Arendasy and Sommer (2007)
Cognitive control: goal management Abstraction requirement Abstraction: mental shifting Abstraction: mental shifting Abstraction: mental shifting Cognitive control and storage Encoding and storage demands Encoding and storage demands Transformation complexity Transformation complexity Confirmation complexity Retrieval ease (base activation) Cognitive control: goal management Cognitive control: goal management Cognitive control: mental shifting Cognitive control: inhibition Problem schema retrieval ease Algebraic complexity Mathematization complexity Mathematization complexity
Arendasy, Sommer, and Mayr (2012)
References
Arendasy and Sommer (2012)
Cognitive component process
Number of rules Rule complexity Periodicity Embedded rules Recursive rules Number of parts Figural complexity of the parts Angular complexity Rotation requirement Displacement of the parts Off-set of the ensemble object Word frequency Number of letters Common letter combinations Letter swaps Number of similar words Typicality of the cover story Number of unknown elements Number of partial equations Percentage of partial equations not explicitly mentioned .78 14
In this task respondents were administered a jumbled sequence of letters arranged in a row. The task of the respondents was to rearrange the letters to form a noun by clicking them in the correct sequence. Respondents were administered several algebra word problems. Their task was to find out the numerical value of the unknown and type it into the answer box. Word fluency
Algebra Word Problems
The items resemble classic picture completion tasks. The task of the respondent was to choose the figure among a set of four alternatives that can be formed by rearranging the parts. Visualization
20
81
.74 20
• • • • • • • • • • • • • • • • • • • • .82
Cronbach α Task description
The items represent number series problems and the task of the respondent was to type in the number, which completed the number series. Numerical-inductive reasoning
K
18
Item design feature
• • • • • • • • • • • • • • • • • • • •
Arendasy and Sommer (2013b)
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
Name of the test
Table 1 Name of the psychometric measures, task description, number of items in each subtest (k), measurement precision (Cronbach α), item design features used to generate the items, cognitive component process the item design feature has been linked to and reference.
120
anxiety can be unambiguously attributed to the time-point of measurement.
5.2.2. State and trait test anxiety questionnaires The short form of the Test Anxiety Inventory — German version was used to measure trait test anxiety (Wacker et al., 2008). The questionnaire consisted of 12 items, which measure three components of test anxiety: worry (5 items: Guttmann λ2 = .877), task-irrelevant thinking (3 items: Guttmann λ2 = .752) and emotionality (4 items: Guttmann λ2 = .880). Responses were given on a four-point rating scale. A modified version of the German STAI-short version (Englert et al., 2011) was used to assess state test anxiety. Two items measured worry and two items assessed emotionality. These four items were directly taken from the German STAI-short version. In addition, two items measuring task-irrelevant thinking were constructed for the purpose of the present study since the STAI contained no items measuring task-irrelevant thinking. Responses were given on a four-point rating scale. Pre- and post-test state anxiety measures were almost identical and differed merely by using presence in the pre-test state test anxiety questionnaire and past-tense in the post-test state test anxiety questionnaire.
5.3. Procedure Respondents were told that their local university considers including cognitive ability tests in their Masters and/or PhD admission test. Furthermore, respondents received the instruction that the present research project aims to evaluate social reactions to admission testing and the tests used in this project will be similar to those used in the actual admission test because prior research already indicated that they predict academic success and that they are not susceptible to coaching. Since the authors of this article have been involved in the development of admission tests in the past this instruction was believable. Next general information on the order of the tests administered was provided. At first respondents were asked to complete a reaction to tests questionnaire, which included our trait test anxiety questionnaire. Next, the cognitive ability tests were administered in the following order: (1) Numerical-inductive Reasoning Test (NID), (2) Word Fluency (VF), (3) Visualization (FZ) and Algebra Word Problems (AD). After receiving the instruction of each subtest and upon completing the first three test items state test anxiety was assessed. For ease of reference these state test anxiety measures will be called pre-test state anxiety measures. Once the state test anxiety questionnaire has been completed, the respondents took the remaining items of the respective subtest. After all items of the subtests have been completed, the second state test anxiety questionnaire was administered, which will be referred to as post-test state anxiety measure. This procedure was identical for all four cognitive ability tests, thus yielding a total of four pre-test and four post-test state anxiety measures. Once the data were collected, test administrators explained the real aim of the study to the respondents and told them that the university in fact never considered including these tests in their admission test battery.
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
5.4. Tested models 5.4.1. Evaluating measurement invariance by means of item response theory At first, we evaluated whether the 1PL Rasch model (Rasch, 1980) and the LLTM model (Fischer, 1995) were examined. This was done by means of a Likelihood Ratio test (LRT: Andersen, 1973). In case of the 1PL Rasch model the partitioning criterion “median raw score” was used since this model fit test has been shown to be sensitive to a misfit of the 1PL Rasch model in simulation studies (Suárez-Falcon & Glas, 2003). For the LLTM we compared the fit of the LLTM to the fit of the 1PL Rasch model. If both model fit statistics fail to reach significance, the data can be assumed to be unidimensional and the item design feature hypothesized to be linked to cognitive component processes involved in test solving can be assumed to account for the item parameters. Next, we examined the invariance of the 1PL Rasch model item parameters and the LLTM basic parameter estimates across respondents differing in state/trait test anxiety. To do so, factor scores were calculated for the state and trait test anxiety measures and a median split was conducted to separate respondents in groups of low and high test anxiety. The resulting causal variables were used as partitioning criteria in subsequent Likelihood Ratio Tests, which examined the invariance of the item and/or basic parameter estimates across groups of high vs. low test anxious respondents. All calculations were conducted using the software LPCM-Win (Fischer & Ponocny-Seliger, 1999) and conditional maximum likelihood (CML) was used to estimate the model parameters. 5.4.2. Evaluating measurement bias by means of structural equation modeling We used the model proposed by Halpin et al. (in press) to examine measurement invariance. To reduce the complexity of the model separate No-Bias and General Bias models were calculated for the three trait test anxiety components. Furthermore, the items of the cognitive ability tests mere combined into three clusters that were homogeneous in terms of item difficulty. These item clusters are commonly called item parcels. In all analyses a nested factor measurement model (cf. Gustafsson & Balke, 1993) was used for the cognitive ability tests. In this model all indicators were assumed to load on psychometric g. Furthermore, the indicators for visualization (PC), word fluency (VF) and algebra word problem solving (AR) also loaded on a narrower cognitive ability factor specified to be orthogonal to psychometric g. The specified model is in line with previous studies, which indicated that most cognitive ability tests measure psychometric g in addition to narrower cognitive abilities, while measures of inductive reasoning merely load on psychometric g (e.g. Arendasy, Hergovich, & Sommer, 2008; Gustafsson & Balke, 1993; Kvist & Gustafsson, 2008; Reeve & Bonaccio, 2008). In the No-Bias model the resulting cognitive ability factors and the latent trait test anxiety factor were allowed to be correlated (=deficit effect). In the unrestricted General Bias models direct path from the trait test anxiety factor to the indicator variables of the cognitive ability tests was specified in addition to the deficit effects. These direct paths will be referred to as interference effects (cf. Halpin et al., in press). In order to achieve model identification the interference effects to the easiest item parcel
121
of each subtests was set to zero. Since the resulting interference- and deficit parameters of unrestricted General Bias model cannot be unambiguously interpreted, restricted General Bias models were specified as follows: The interference effect parameters to the easiest item parcels for all subtests were constraint to be equal. Furthermore, we also constrained the interference effect parameters to the intermediately difficult and difficult item clusters to be equal across all four subtests. In the restricted General Bias model both the deficit and interference effects can be readily interpreted. The procedure used to evaluate and compare the fit of these three different kinds of models was identical to the standard procedure described in 3.2. Testing for measurement bias due to state test anxiety proceeded in a similar manner. However, in these analyses the respective latent-trait latent-sate measurement factor(s) served as causal latent factor. The analyses were also carried out separately for the three state test anxiety components measured. In all analyses the No-Bias models assumed that the occasion-unspecific second-order factor and the latent cognitive ability factors were correlated. In addition, the preand post-test occasion specific factors were allowed to correlate with their respective narrower cognitive ability factor. In the restricted and unrestricted General Bias models direct paths from the causal latent factors to the cognitive ability test items were added. In the unrestricted General Bias models interference effect parameters to the easiest item parcel of each subtest were set to zero. In the restricted General Bias models equality constraints for the interference effect parameters from each causal factor to the three indicator variables of each subtest were imposed as follows: interference effect parameters from the occasion-unspecific second-order state anxiety factor to the easiest item parcels were constrained to be equal. In addition, the same equality constraints were used to restrict the interference effect parameters to the remaining two item parcels. Constraints used for the interference effect parameters from the pre- and post-test occasion specific parameters were identical to the ones described above. All calculations were carried out using the software MPlus (Muthén & Muthén, 2010) and robust weighted least square for categorical data (WLSMV) was used to estimate the model parameters based on recommendations from simulation studies (cf. Beauducel & Herzberg, 2006; Rhemtulla, Brosseau, & Savalei, 2012). 6. Results 6.1. Results on the fit of the various measurement models First we examined the fit of the measurement model for the cognitive ability tests and the state- and trait test anxiety measures. The cognitive ability nested factor model fitted the data well (χ2[45] = 61.544, p = .051, CFI = .967, RSMEA = .039). Similar results were obtained for the model specified for the trait test anxiety components (χ2[49] = 65.458, p = .058, CFI = .978, RSMEA = .047) and the state test anxiety components at each time-point of measurement (T1: χ2[6] = 8.162, p = .226, CFI = .994, RSMEA = .035; T2: χ2[6] = 10.710, p = .098, CFI = .987, RSMEA = .052; T3: χ2[6] = 11.239, p = .081, CFI = .989, RSMEA = .055; T4: χ2[6] = 9.767, p = .135, CFI = .991, RSMEA = .047; T5: χ2[6] = 11.926, p = .064,
122
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
CFI = .986, RSMEA = .058; T6: χ2[6] = 10.151, p = .118, CFI = .990, RSMEA = .049; T7: χ2[6] = 11.751, p = .068, CFI = .986, RSMEA = .058; T8: χ2[6] = 9.250, p = .160, CFI = .992, RSMEA = .043). All parameters turned out to be significant and of the expected magnitude in all measurement models examined. However, in case of the trait test anxiety model we had to allow two indicator items of the trait worry component to be correlated to achieve an acceptable model fit. The correlated residuals between these two items were attributable to similarities in the item wording of these two items. The standardized solution for the cognitive ability measurement model and the trait test anxiety model can be seen in Fig. 1. Next, the variance in the state test anxiety measures was decomposed into occasion-unspecific components, occasionspecific components and measurement error using the extended indicator-specific latent trait LST model outlined in Section 4.2 (Eid & Diener, 1999; Geiser & Lockhart, 2012). While the state worry model (χ2[109] = 136.578, p = .038, CFI = .983, RSMEA = .025) and the state task-irrelevant thinking model (χ2[109] = 135.783, p = .042, CFI = .984, RSMEA = .024) fitted the data well, the LST model for state emotionality failed to do so. This model resulted in negative estimates for eight variances of the occasion-specific factors. We thus tested an alternate model, in which we omitted the occasion-specific factors. This model exhibited a good fit to the data (χ2[118] = 156.788, p = .010, CFI = .978, RSMEA = .028) and was therefore retained as the final measurement model. Finally, we evaluate whether the second-order occasionunspecific latent state factors represents trait test anxiety. Therefore the occasion-unspecific latent state factor was regressed on its respective trait test anxiety component factor. The resulting SEM models fitted the data well (worry: χ2 [192] = 239.687, p = .011, CFI = .976, RSMEA = .032; taskirrelevant thinking: χ2[156] = 201.730, p = .008, CFI = .953, RSMEA = .047; emotionality: χ2[183] = 233.352, p = .007, CFI = .977, RSMEA = .034). The standardized regression path from the trait test anxiety component to the second-order occasion-unspecific latent state factors were .71 in case of the worry component, .60 for task-irrelevant thinking, and .61 for emotionality. The results thus indicated that even though the trait test anxiety component significantly predicted its respective occasion-unspecific latent state test anxiety component, the two latent traits turned out to be clearly separable. 6.2. Results on measurement bias using item response theory The results of the item response theory measurement invariance models were resummarized in Table 2. Table 2 indicated that the 1PL Rasch model and the LLTM model fitted the data obtained with the four cognitive ability tests reasonably well. Furthermore, the LLTM basic parameter estimates were identical to those obtained in previous studies. Most interestingly, none of the trait- and/or state test anxiety components has been shown to induce measurement bias. Thus, the item parameters and the basic parameter estimates turned out to be invariant across different levels of trait/state test anxiety. This result is in line with the deficit hypothesis and contradicts predictions made on the basis of the different interference models.
6.3. Results on measurement bias using structural equation modeling The model fit statistics of the No-Bias models and the two General Bias models for the three trait test anxiety components are summarized in Table 3. In neither case did the General Bias models fit the data better than the No-Bias models. This indicated that neither trait worry, nor trait test-irrelevant thinking and trait emotionality induces measurement bias. This finding is consistent with the results obtained in the item response theory analyses and also argues for the deficit hypothesis. In the No-bias models trait worry correlated significantly (p b .05) with psychometric g (r = − .41) and algebra word problem solving (r = − .32), while the deficit effects for verbal fluency (r = − .05) and visualization (r = − .05) failed to reach significance. In case if trait task-irrelevant thinking significant deficit effects were observed for psychometric g (r = − .42) and verbal fluency (r = − .34), while the remaining deficit effects failed to reach significance (visualization: r = − .01; algebra word problem solving: r = −.06). Trait emotionality, on the other hand, was only significantly correlated with psychometric g (r = − .29). All other deficit effects for trait emotionality failed to reach significance (visualization: r = − .04; verbal fluency: r = − .01; algebra word problem solving: r = − .08). In general, these findings are consistent with prior studies indicating that trait test anxiety does not induce measurement bias (Reeve & Bonaccio, 2008). Furthermore, the magnitude of the deficit effects is in line with previous studies even after accounting for the fact that the deficit effect parameters reported in this article are free of measurement error (cf. Reeve, Heggestad, & Lievens, 2009). Similar results were obtained when the state test anxiety components were used as causal factors in the measurement bias analyses. The model fit statistics of the No-Bias and General Bias models for the three state test anxiety components are summarized in Table 4. Again, the No-Bias models fitted the data no worse than either the unrestricted, or the restricted General Bias models. This indicated that neither the occasion-unspecific, nor the occasion specific pre- and post-test state test anxiety factors induce measurement bias. Results on the deficit effect parameter estimates for the occasion-unspecific and occasion-specific pre- and post-test state test anxiety factors for state worry and state task-irrelevant thinking component are discussed in the next section. In case of state emotionality the occasion-unspecific second-order factor was significantly correlated with psychometric g (r = −.20) but none of the narrow cognitive ability factors (visualization: r = −.07; verbal fluency: r = .00; algebra word problem solving: r = −.03). 6.4. Results on the relation between test performance and preand post-test state test anxiety In a final step we analyzed the magnitude of the deficit effect obtained in the No-Bias models for state worry and state task-irrelevant thinking. In case of state worry, the occasionunspecific second-order factor was significantly (p b .05) correlated with psychometric g (r = −.27) and the narrower algebra
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
123
Cognitive ability model
Trait test anxiety model
Fig. 1. Standardized solution for the cognitive ability measurement model (above) and trait test anxiety measurement model (below).
Table 2 Measurement invariance of the item- (top) and basic parameter (bottom) estimates of the four cognitive ability tests. Cognitive ability test
Numerical-inductive reasoning
Partitioning criterion
χ2
Visualization
Algebra word problems
Df
p
χ2
df
P
χ2
df
P
Measurement invariance of the item parameter estimates (!PL Rasch model) 1PL model fit 25.245 17 .089 23.327 Trait worry 14.533 17 .629 16.085 Trait task-irrelevant thinking 27.007 17 .058 15.733 Trait emotionality 16.815 17 .467 16.374 Pre-test state worry 15.014 17 .594 24.196 Pre-test state task-irrelevant th. 17.446 17 .425 23.514 Pre-test state emotionality 18.763 17 .342 21.731 Post-test state worry 18.754 17 .343 20.829 Post-test state task-irrelevant th. 11.948 17 .803 12.428 Post-test state emotionality 15.999 17 .524 15.453
19 19 19 19 19 19 19 19 19 19
.223 .652 .657 .632 .189 .215 .298 .346 .866 .693
28.731 24.710 18.790 20.006 9.450 14.128 9.943 10.766 10.152 13.689
19 19 19 19 19 19 19 19 19 19
.070 .170 .470 .394 .965 .776 .954 .931 .949 .802
18.967 8.090 17.773 16.487 12.069 13.892 10.554 16.050 18.771 10.423
13 13 13 13 13 13 13 13 13 13
.124 .838 .166 .224 .522 .381 .648 .246 .130 .659
Measurement invariance of the basic parameter estimates (LLTM model) LLTM model fit 23.898 12 .021 19.850 Trait worry 14.036 11 .231 15.605 Trait task-irrelevant thinking 17.348 11 .098 13.555 Trait emotionality 12.266 11 .344 12.299 Pre-test state worry 12.349 11 .338 14.167 Pre-test state task-irrelevant th. 14.354 11 .214 16.989 Pre-test state emotionality 14.693 11 .197 13.266 Post-test state worry 11.517 11 .401 17.860 Post-test state task-irrelevant th. 13.067 11 .289 12.796 Post-test state emotionality 15.144 11 .176 12.071
13 12 12 12 12 12 12 12 12 12
.010 .210 .330 .422 .290 .150 .350 .120 .384 .440
23.613 19.443 13.969 16.584 9.170 11.959 8.494 14.769 12.253 13.128
14 13 13 13 13 13 13 13 13 13
.051 .110 .376 .219 .760 .531 .810 .322 .507 .438
21.089 6.082 12.881 11.048 6.396 7.517 6.468 9.511 13.330 5.437
9 8 8 8 8 8 8 8 8 8
.012 .638 .116 .199 .603 .482 .595 .301 .101 .710
df
p
χ2
Word fluency
124
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
Table 3 Goodness of fit of the no-bias model and the unrestricted and restricted General Bias models for trait worry (above), trait task-irrelevant thinking (middle) and trait emotionality (bottom). p
CFI
RSMEA
Δχ2
Δdf
p
ΔCFI
.152 .103 .120
.979 .974 .976
.018 .021 .020
– 4.819 .885
– 8 3
– .777 .829
– .005 .003
Results obtained for the causal latent trait task-irrelevant thinking factor No-bias 96.151 81 .120 General bias 92.137 73 .065 Restricted general bias 95.362 78 .088
.968 .959 .963
.021 .025 .023
– 4.014 .789
– 8 3
– .856 .852
– .009 .005
Results obtained for the causal latent trait emotionality factor No-bias 106.296 99 General bias 102.790 91 Restricted general bias 105.965 96
.989 .982 .985
.013 .018 .016
– 3.506 .331
– 8 3
– .899 .954
– .007 .004
Tested model
χ2
df
Results obtained for the causal latent trait worry factor No-bias 124.169 109 General bias 119.350 101 Restricted general bias 123.284 106
.290 .187 .229
word problem solving factor (r = −.26). All other deficit effect parameters turned out to be insignificant (visualization: r = −.04; verbal fluency: r = −.07). Most interestingly, all deficit effect for the occasion-specific factors representing the pre-test state worry measures were insignificant (psychometric g: r = −.10; visualization: r = .00; verbal fluency: r = −.06; algebra word problem solving: r = −.10). By contrast, the occasion specific deficit effect for the post test state worry measures were significantly correlated with psychometric g (r = −.23), verbal fluency (r = −.20) and algebra word problem solving (r = −.21). Similar results were observed in case of state task-irrelevant thinking. The occasion-unspecific component of task-irrelevant thinking was significantly correlated with psychometric g (r = −.27) and the verbal fluency factor (r = −.21) while the remaining deficit effects failed to reach significance (visualization: r = −.09; algebra word problem solving: r = −.10). Again, all deficit effects for the occasion-specific pre-test component were insignificant (psychometric g: r = .01; visualization: r = .05; verbal fluency: r = −.09; algebra word problem solving: r = −.06), while the occasion-specific post-test component was significantly correlated with psychometric g (r = −.23), the verbal fluency factor (r = −.27) and the algebra word problem solving factor (r = −.21). Only the
deficit effect for the post-test occasion specific component and the narrower visualization factor failed to reach significance (r = −.12). These findings are in line with the hypothesis that differences occasion-specific factors of the post-test state anxiety measures account for the higher correlation between test anxiety and test performance observed in studies in which test anxiety was measured after respondents completed the entire test battery. 7. Discussion In selected situations it is usually desirable that cognitive test items measure the same latent trait and that the item parameters are invariant across different groups of test-takers. This is a necessary prerequisite to interpret mean test score differences at face value and to rank-order all respondents according to their test results as is common practice in selection situations (for an overview: Mislevy et al., 2013). In general, measurement fairness can be compromised by various sources of construct-irrelevant variance. Based on correlational evidence some researchers hypothesized that test anxiety may induce measurement bias, which leads to a systematic underestimation of the true cognitive abilities of respondents
Table 4 Goodness of fit of the no-bias model and the unrestricted and restricted general bias models for state worry (above), state task-irrelevant thinking (middle) and state emotionality (bottom). CFI
RSMEA
Δχ2
Δdf
p
ΔCFI
.969 .966 .967
.021 .023 .022
– 17.099 5.261
– 24 9
– .844 .811
– .003 .002
Results obtained for the causal latent state task-irrelevant thinking factors No-bias 413.812 338 .003 General bias 395.608 312 .001 Restricted general bias 408.421 315 .001
.963 .960 .962
.023 .025 .024
– 18.204 5.391
– 24 9
– .793 .799
– .003 .001
Results obtained for the causal latent state emotionality factor No-bias 433.995 350 General bias 431.661 342 Restricted general bias 433.129 347
.964 .961 .963
.024 .025 .025
– 2.334 .731
– 8 3
– .969 .866
– .003 .001
Tested model
χ2
df
Results obtained for the causal latent state worry factors No-bias 401.409 338 General bias 384.310 314 Restricted general bias 396.148 329
p .010 .328 .349
.001 b.001 .001
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
high in trait and/or state test anxiety (Haladyna & Downing, 2004; Hembree, 1988). However, the finding that test anxiety and test performance are moderately correlated does not necessarily imply that test anxiety has a detrimental causal effect on test performance (cf. Reeve & Bonaccio, 2008). The observed correlation could be either attributable to an interference effect, a deficit effect, or both. In purely correlational data these two kinds of effects are naturally confounded. Several other researchers (e.g. Halpin et al., in press; Reeve & Bonaccio, 2008) already outlined how item response theory analyses and structural equation modeling can be used to disentangle these two effects in evaluating competing theoretical explanations of the test anxiety-test performance relationship. In the present article we further extended prior studies by outlining how measurement bias analyses can be used to evaluate competing predictions deduced from different versions of the interference hypothesis by deriving precise predictions on the kind of items assumed to be most vulnerable to the interference effect. Furthermore, in contrast to previous studies, which mainly focused on a single cognitive component of trait test anxiety, we took into account three different components of trait- and state test anxiety. In line with a previous study (Reeve & Bonaccio, 2008) our results failed to obtain evidence in favor of an interference effect in case of both trait- and state test anxiety. Thus, contrary to predictions made on the basis of different versions of the interference hypothesis our cognitive ability tests exhibited measurement invariance across different levels of state- and trait test anxiety. This finding is in line with the deficit hypothesis, which states that trait- and/or state test anxiety arises from repeated and/or current experiences of one's own cognitive limitations during test-taking. However, our findings contradict one previous finding, which indicated that the correlation between trait task-irrelevant thinking and scholastic achievement test performance is entirely due to measurement bias operating at the item level (Halpin et al., in press). A possible explanation for these conflicting results is that the effect of trait/state test anxiety differs for cognitive ability and scholastic achievement tests. Similar arguments have already been advanced to account for divergent findings regarding retest effects in cognitive ability and scholastic achievement tests (for an overview: Arendasy & Sommer, 2013a). A possible limitation of the present study is that our research has been conducted in a low-sakes setting. Although research findings in low- and high-stakes settings were generally consistent, high-stakes settings have been shown to induce higher levels of test anxiety and also tend to increase the correlation between test anxiety and test performance (cf. Nie et al., 2011; Reeve et al., 2008). We tried to get closer to an actual high-stakes setting by increasing the goal relevance and goal congruence by means of our general instruction that made respondents believe that the tests they are working on will be eventually be used to select students for the Masters and PhD program at their own university. Although the correlation between test anxiety measures and the cognitive ability traits were within the range reported for high-stakes situations (Reeve et al., 2009) we cannot rule out that results might have differed if the present study would have been conducted in a high-stakes setting. Future studies should therefore seek to replicate our findings in an actual high-stakes setting. If these studies yield divergent findings regarding measurement
125
invariance across settings, current models on the test anxiety– test performance relationship need to be revised by incorporating a threshold at which anxiety starts to exert a detrimental causal effect on test performance. At present, none of the currently available models postulates such a threshold mechanism. Thus future studies, which systematically vary the stakes of the assessment setting, could provide valuable information to refine current theoretical models of the test anxiety–test performance relationship in case empirical results indicate lack of measurement invariance across different levels of state and/ or trait test anxiety in high- but not in low-stakes settings. Further support for the deficit hypothesis has been obtained in our analyses of the latent correlations between cognitive ability factors and the occasion-unspecific and occasionspecific state worry and state task-irrelevant thinking factors. The occasion-unspecific factors were correlated primarily with psychometric g, which corroborated prior studies (cf. Ackerman & Heggestad, 1997; Reeve & Bonaccio, 2008; Salthouse, 2012). Furthermore the post-test occasion-specific state test anxiety factors were more strongly correlated with their respective cognitive ability factor than the pre-test occasion-specific state test anxiety factors. This finding confirms predictions made on the basis of the deficit model and contradicts thus deduced from recent versions of the interference hypothesis. In addition, the finding that post-test occasion-specific factors of test anxiety are related correlated with test performance is consistent with previous findings, which indicated that post-test state/trait test anxiety measures are more strongly correlated with test performance than pre-test state/trait test anxiety measures (Klinger, 1984; Stohbeck-Kühner, 1999; Zeidner, 1991). It extends these previous findings by providing evidence for the hypothesis that this effect is due to factors operating at the post-test test anxiety measures (cf. Zeidner, 1995, 1998). At present we can only speculate on potential causes of individual differences in post-test occasion-specific factors. Future studies should thus examine, whether individual differences in occasion-specific state test anxiety factors can be explained in terms of individual differences in situational and/or dispositional determinates of the cognitive appraisal process.
References Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and interests: Evidence for overlapping traits. Psychological Bulletin, 121, 219–245. Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrica, 8, 123–140. Arendasy, M., Hergovich, A., & Sommer, M. (2008). Investigating the ‘g’ saturation of various stratum-two factors using automatic item generation. Intelligence, 36, 574–583. Arendasy, M., & Sommer, M. (2007). Automatic generation of quantitative reasoning items: A schema-based isomorphic approach. Learning and Individual Differences, 17, 366–383. Arendasy, M., & Sommer, M. (2012). Using automatic item generation to meet the increasing item demands of high-stakes assessment. Learning and Individual Differences, 22, 112–117. Arendasy, M., & Sommer, M. (2013a). Quantitative differences in retest effects across different methods used to construct alternate test forms. Intelligence, 41, 181–192. Arendasy, M., & Sommer, M. (2013b). Automatic generation and first evidences on the dimensionality. Measurement fairness and construct representation of a picture completion task. Unpublished research report: Graz: University of Graz. Arendasy, M., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct German and English versions of a word fluency test. Cross-Cultural Psychology, 43, 464–479.
126
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127
Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least square estimation in CFA. Structural Equation Modeling: A Multidisciplinary Journal, 13, 186–203. Benson, J., & Bandalos, D. L. (1992). Second-order confirmatory factor analysis of the reaction to tests scale with cross validation. Multivariate Behavioral Research, 27, 459–487. Cassady, J. C., & Johnson, R. E. (2002). Cognitive test anxiety and academic performance. Contemporary Educational Psychology, 27, 270–295. Chen, H. (2012). The moderating effect of item order arrangement by difficulty on the relationship between test anxiety and test performance. Creative Education, 3, 328–333. Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. Davis, H. A., DiStefano, Ch., & Schutz, P. A. (2008). Identifying patterns of appraising tests in first-year college students: Implications for anxiety and emotion regulation during test-taking. Journal of Educational Psychology, 100, 942–960. Drasgow, F. (1987). Study of measurement bias of two standardized psychological tests. Journal of Applied Psychology, 72, 19–29. Eid, M., & Diener, E. (1999). Intraindividual variability in affect: Reliability, validity, and personality correlates. Journal of Personality and Social Psychology, 4, 662–676. Englert, C., Bertrams, A., & Dickhäuser, O. (2011). Entwicklung der Fünf-ItemKurzskala STAI-SKD zur Messung von Zustandsangst. Zeitschrift für Gesundheitspsychologie, 19, 173–180. Eysenck, M. W., & Calvo, M. G. (1992). Anxiety and performance: The processing efficiency theory. Cognition and Emotion, 6, 409–434. Eysenck, M. W., & Derakshan, N. (2011). New perspectives in attentional control theory. Personality and Individual Differences, 50, 955–960. Eysenck, M. W., Derakshan, N., Santos, R., & Calvo, M. G. (2007). Anxiety and cognitive performance: Attentional control theory. Emotion, 7, 336–353. Fischer, G. H. (1995). The linear logistic test model. In G. H. Fischer, & I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp. 157–180). New York: Springer. Fischer, G. H., & Ponocny-Seliger, E. (1999). LPCM-Win software and manual. Assessment Software Corporation, MN. Geiser, Ch., & Lockhart, G. (2012). A comparison of four approaches to account for method effects in latent state-trait analyses. Psychological Methods, 17, 255–283. Goetz, T., Preckel, F., Pekrun, R., & Hall, N. C. (2007). Emotional experiences during test taking: Does cognitive ability make a difference? Learning and Individual Differences, 17, 3–16. Gustafsson, J. E., & Balke, G. (1993). General and narrow abilities as predictors of school achievement. Multivariate Behavioral Research, 28, 407–434. Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23, 17–27. Halpin, P. F., da-Silvva, C., & De Boeck, P. (in press). A confirmatory factor analysis approach to test anxiety. Structural Equation Modeling: A Multidisciplinary Journal (in press). Hembree, R. (1988). Correlates, causes, effects and treatment of test anxiety. Review of Educational Research, 58, 47–77. Hodapp, V., & Benson, J. (1997). The multidimensionality of test anxiety: A test of different models. Anxiety, Stress and Coping: An International Journal, 10, 219–244. Hong, E. (1998). Differential stability of individual differences in state and trait test anxiety. Learning and Individual Differences, 10, 51–69. Hong, E. (1999). Test anxiety, perceived test difficulty, and test performance: Temporal patterns of their effects. Learning and Individual Differences, 11, 431–447. Hong, E., & Karstensson, L. (2001). Antecedents of state test anxiety. Contemporary Educational Psychology, 27, 348–367. Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1–55. Keith, N., Hodapp, V., Schermelleh-Engel, K., & Moosbrugger, H. (2003). Cross-sectional and longitudinal confirmatory factor models for the German test anxiety inventory: A construct validation. Anxiety, Stress and Coping: An International Journal, 16, 251–270. Kim, S. H., & Rocklin, T. (1994). The temporal patterns of worry and emotionality and their differential effects on test performance. Anxiety, Stress, and Coping: An International Journal, 7, 117–130. Klinger, E. (1984). A conscious-sampling analysis of test anxiety and performance. Journal of Personality and Social Psychology, 47, 1376–1390.
Kvist, A. V., & Gustafsson, J. E. (2008). The relationship between fluid intelligence and the general factor as a function of cultural background: A test of Cattell's investment theory. Intelligence, 36, 422–436. Lang, J. W. B., & Lang, J. (2010). Priming competence diminishes the link between cognitive test anxiety and test performance: Implications for the interpretation of test scores. Psychological Science, 21, 811–819. Lubke, G. H., Dolan, C. V., Kelderman, H., & Mellenbergh, G. J. (2003). On the relationship between sources of within- and between-group differences and measurement invariance in common factor model. Intelligence, 31, 543–566. Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis testing approaches to setting cut off values for fit indexes and dangers in overgeneralizing Hu & Bentler's (1999) findings. Structural Equation Modeling, 11, 320–341. McCarthy, J. M., & Goffin, R. D. (2005). Selection test anxiety: Exploring tension and fear of failure across the sexes in simulated selection scenarios. International Journal of Selection and Assessment, 13, 282–295. Meijer, J., & Oostdam, R. (2011). Effects of instruction and stage-fright on intelligence testing. European Journal of Psychology of Education, 26, 143–161. Millsap, R. E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260. Mislevy, R. J., Haertel, G., Cheng, B. H., Ructtinger, L., DeBarger, A., Murray, E., et al. (2013). A “conditional” sense of fairness in assessment. Educational research and evaluation. An International Journal on Theory and Practice, 19, 121–140. Muthén, B. O. (1989). Using item specific instructional information in achievement modeling. Psychometrica, 54, 385–396. Muthén, L. K., & Muthén, B. O. (2010). Mplus user's guide. Los Angeles, CA: Muthén & Muthén. Nie, Y., Lau, S., & Liau, A. K. (2011). Role of academic self-efficiency in moderating the relation between task importance and test anxiety. Learning and Individual Differences, 21, 736–741. Oostdam, R., & Meijer, J. (2003). Influence of test anxiety on measurement of intelligence. Psychological Reports, 92, 3–20. Paulman, R., & Kennelly, K. (1984). Test anxiety and ineffective test taking: Different names, same constructs? Journal of Educational Psychology, 76, 279–288. Pekrun, R., Elliot, A. J., & Maier, M. A. (2009). Achievement goals and achievement emotions: Testing a model of their joint relation with academic performance. Journal of Educational Psychology, 101, 115–135. Putwain, D. W. (2008). Deconstructing test anxiety. Emotional and Behavioral Difficulties, 13, 141–155. Putwain, D. W., Connors, L., & Symes, W. (2010). Do cognitive distortions mediate the test anxiety-examination performance relationship? Educational Psychology, 30, 11–26. Putwain, D. W., & Symes, W. (2012). Achievement goals as mediators of the relationship between competency beliefs and test anxiety. British Journal of Educational Psychology, 82, 207–224. Rajo, N. S., Laffine, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517–529. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press. Reeve, Ch. L., & Bonaccio, S. (2008). Does test anxiety induce measurement bias in cognitive ability tests? Intelligence, 36, 526–538. Reeve, Ch. L., Bonaccio, S., & Charles, J. E. (2008). A policy-capturing study of the contextual antecedents of test anxiety. Personality and Individual Differences, 45, 243–248. Reeve, Ch. L., Heggestad, E. D., & Lievens, F. (2009). Modeling the impact of test anxiety and test familiarity on the criterion-related validity of cognitive ability tests. Intelligence, 37, 34–41. Rhemtulla, M., Brosseau, P. E., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17, 354–373. Salthouse, T. A. (2012). How general are the effects of trait anxiety and depressive symptoms on cognitive functioning? Emotion, 12, 1075–1084. Sarason, I. G. (1984). Stress, anxiety, and cognitive interference: Reactions to tests. Journal of Personality and Social Psychology, 46, 929–938. Schmuckle, S. C., & Egloff, B. (2005). A latent state-trait analysis of implicit and explicit personality measures. European Journal of Psychological Assessment, 21, 100–107. Schutz, P. A., Di Stefano, Ch., Benson, J., & Davis, H. A. (2004). The emotional regulation during test-taking scale. Anxiety, Stress, and Coping: An International Journal, 17, 253–269.
M. Sommer, M.E. Arendasy / Intelligence 42 (2014) 115–127 Smith, T. W., Snyder, C. R., & Handelsman, M. M. (1982). On the self-serving function of an academic wooden leg: Test anxiety as a self-handicapping strategy. Journal of Personality and Social Psychology, 42, 314–321. Spielberger, C. D., & Vagg, R. P. (1995). Test anxiety: Theory, assessment and treatment. Bristol, UK: Taylor & Francis. Stohbeck-Kühner, P. (1999). Testangst bei Fahreignungsbegutachtungen: Die Angst-Leistung-Relation. Zeitschrift für Differentielle und Diagnostische Psychologie, 20, 39–57. Suárez-Falcon, J. C., & Glas, C. A. W. (2003). Evaluation of global testing procedures for item fit to the Rasch model. The British Journal of Mathematical and Statistical Psychology, 56, 127–143. Tobias, S. (1985). Test anxiety: Interference, defective skills and cognitive capacity. Educational Psychologist, 3, 135–142. Wacker, A., Jaunzeme, J., & Jaksztat, S. (2008). Eine Kurzform des Prüfungsängstlichkeitsinvenatars TAI-G. Zeitschrift für Pädagogische Psychologie, 22, 73–81.
127
Wicherts, J. M., & Scholten, A. Z. (2010). Test anxiety and the validity of cognitive tests: A confirmatory factor analyses perspective and some empirical findings. Intelligence, 38, 169–178. Wine, J. D. (1971). Test anxiety and direction of attention. Psychological Bulletin, 76, 92–104. Woods, C. M. (2009). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivariate Behavioral Research, 44, 1–27. Zeidner, M. (1991). Test anxiety and aptitude test performance in an actual college admissions testing situation: Temporal considerations. Personality and Individual Differences, 12, 101–109. Zeidner, M. (1995). Personality correlates of intelligence. In D. Saklofske, & M. Zeidner (Eds.), International handbook of personality and intelligence: Perspectives on individual differences (pp. 299–314). New York: Plenum Press. Zeidner, M. (1998). Test anxiety: The state of the art. New York: Plenum.