On the benefits of ‘doing science’: Does integrative writing about scientific controversies foster epistemic beliefs?

On the benefits of ‘doing science’: Does integrative writing about scientific controversies foster epistemic beliefs?

Contemporary Educational Psychology 58 (2019) 85–101 Contents lists available at ScienceDirect Contemporary Educational Psychology journal homepage:...

951KB Sizes 0 Downloads 33 Views

Contemporary Educational Psychology 58 (2019) 85–101

Contents lists available at ScienceDirect

Contemporary Educational Psychology journal homepage: www.elsevier.com/locate/cedpsych

On the benefits of ‘doing science’: Does integrative writing about scientific controversies foster epistemic beliefs?

T

Tom Rosmana, , Anne-Kathrin Mayera, Samuel Merkb, Martin Kerwera ⁎

a b

Leibniz Institute for Psychology Information (ZPID), Universitätsring 15, D-54296 Trier, Germany University of Tübingen, Geschwister-Scholl-Platz, D-72074 Tübingen, Germany

ARTICLE INFO

ABSTRACT

Keywords: Epistemic beliefs Epistemic change Multiple conflicting documents Task instructions Writing Intervention

We examine the effects of writing tasks on epistemic change in the context of an intervention which aims at modifying psychology students’ epistemic beliefs (beliefs about the nature of knowledge and knowing). The intervention uses a multiple-texts approach. Participants first read multiple texts containing controversial evidence from 18 fictional studies on gender stereotyping. After reading, they are asked to write a balanced scientific essay explicitly focusing on the conditions under which boys respectively girls are discriminated against. We expected (1) that an intervention combining reading and writing has positive effects on epistemic beliefs, (2) that the beneficial effects of writing are reduced when the task instructions require writing a general overview or a one-sided argumentative text instead of a balanced essay, and (3) that reading and writing both uniquely contribute to epistemic change. Moreover, (4) we examined the effects of reading and writing on different levels of epistemic beliefs (topic-specific vs. domain-specific). Hypotheses were largely supported using data from two experimental and one correlational study (Study 1: N = 86; Study 2: N = 153; Study 3: N = 93). Implications of the results for research and practice are discussed.

1. Background Epistemic beliefs (i.e., beliefs about the nature of scientific knowledge) have been shown to influence information processing (Kardash & Howell, 2000), critical thinking (Chan, Ho, & Ku, 2011), self-regulated learning (Muis et al., 2015; Trevors, Feyzi-Behnagh, Azevedo, & Bouchet, 2016), and science achievement (Bråten & Ferguson, 2014). Considering the ever-growing amount of scientific knowledge, interest in how to foster students’ epistemic beliefs has grown in recent years (Barzilai & Chinn, 2017). Over the years, the corresponding research has yielded a multitude of methods, such as inquiry tasks, experimentation, problem solving, discussions, confrontations with conflicting evidence, and reflective prompts (e.g., Brownlee, Ferguson, & Ryan, 2017; Kienhues, Bromme, & Stahl, 2008; Muis & Duffy, 2013). With a few exceptions (e.g., Barzilai & Ka’adan, 2017), writing tasks as a tool for epistemic change, however, have been neglected – at least up to now. This is puzzling for several reasons. First, well-designed writing tasks perfectly fit the idea of having students ‘do science’ – an idea that underlies many (or even most) interventions (Kienhues, Ferguson, & Stahl, 2016). Second, they can easily be manipulated on an experimental basis, which allows further conclusions regarding the process of epistemic change. Third, writing tasks have been shown to not only ⁎

affect epistemic beliefs (Barzilai & Ka’adan, 2017), but also aspects such as intertextual comprehension (e.g., Bråten & Strømsø, 2010; Hagen, Braasch, & Bråten, 2014). Fourth, on a more practical level, writing tasks are easily administered and therefore may easily be integrated into courses or even textbooks. In the following pages, we will present three studies that combine the presentation of conflicting scientific evidence with writing tasks in order to foster more advanced epistemic beliefs. Study 1 investigates whether a combination of reading and writing about scientific controversies is generally suited to foster students’ epistemic beliefs, and examines the effects of different writing task instructions on epistemic change. Study 2 analyzes whether changes in topic-specific epistemic beliefs are associated with changes in domain-specific epistemic beliefs (i.e., in terms of a spillover effect). Study 3, finally, analyzes the magnitude of effects that may be ascribed to reading respectively writing tasks. 1.1. Epistemic beliefs Epistemic beliefs (also: epistemological beliefs) are defined as individual conceptions about knowledge and the process of knowing (Hofer & Pintrich, 1997). Developmental frameworks posit different stages that individuals pass through in their epistemic belief

Corresponding author. E-mail address: [email protected] (T. Rosman).

https://doi.org/10.1016/j.cedpsych.2019.02.007

Available online 25 February 2019 0361-476X/ © 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/BY/4.0/).

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

development. According to Kuhn and Weinstock (2002), development begins in the absolutism stage, where knowledge is conceptualized in dualistic, absolute contrasts (right-or-wrong, truth-or-untruth, etc.). Individuals who move on to the second stage, multiplism, stress the subjectivity of knowledge and view scientific knowledge as an accumulation of opinions rather than facts (extreme form: radical subjectivity; Hofer & Pintrich, 1997). The third and most advanced stage is called evaluativism. In this stage, individuals acknowledge that ‘truth’ strongly depends on the issue in question and on its context. Individuals holding evaluativistic beliefs thus weigh evidence and integrate conflicting viewpoints in order to reach their conclusions. Of note is that the developmental sequence postulated in such frameworks is not uncontested (e.g., Elby & Hammer, 2001; Rosman, Mayer, Kerwer, & Krampen, 2017). Therefore, more recent approaches drawing on the framework by Kuhn and Weinstock (2002) no longer classify individuals into distinct stages (e.g., ‘absolutist’ vs. ‘multiplist’ students), but measure the strength of their beliefs on separate scales (Barzilai & Weinstock, 2015; Merk, Rosman, Rueß, Syring, & Schneider, 2017; Peter, Rosman, Mayer, Leichner, & Krampen, 2016; Thomm, Barzilai, & Bromme, 2017). By this means, they acknowledge, for example, that one particular student might have low to moderate absolute beliefs – and, at the same time, high multiplistic beliefs (Peter et al., 2016). Beliefs about justification for knowing are closely related to the framework by Kuhn and Weinstock (2002). This type of beliefs is particularly relevant for multiple documents comprehension (Bråten, Britt, Strømsø, & Rouet, 2011; Bråten, Strømsø, & Ferguson, 2016), and constitutes, from a philosophical standpoint, the most central aspect of epistemic beliefs (Greene, Azevedo, & Torney-Purta, 2008). Building on prior work by Greene et al. (2008), Ferguson, Bråten, and Strømsø (2012), Ferguson, Bråten, Strømsø, and Anmarkrud (2013) differentiate three types of justification beliefs: justification by authority (knowledge claims are justified by referring to an external authoritative source), personal justification (claims are justified by one’s own personal opinion) and justification by multiple sources (claims are justified by “crosschecking, comparing, and corroborating across several sources of information”; Bråten et al., 2016, p. 69). Bråten et al. (2016) furthermore suggest that Kuhn and Weinstock's (2002) developmental stages “essentially describe the development of beliefs about justification for knowing” (p. 72). Justification by authority is closely connected to absolutism since both stress the importance of external authoritative sources (Greene, Torney-Purta, & Azevedo, 2010; Hofer & Pintrich, 1997). Personal justification, in turn, is associated with multiplism because both emphasize the importance of one’s personal opinion (Bråten et al., 2016). Justification by multiple sources, finally, is related to evaluativism since both address comparisons and evaluations of multiple sources (Bråten et al., 2016).

information. Diverging information refers “to all types of information that present different, apparently conflicting, viewpoints to the information consumer” (Kienhues et al., 2016, p. 319). It strengthens a learner’s awareness for the existence of differing positions towards certain issues and is thus well-suited to reduce absolutism and to foster views of scientific knowledge as tentative and evolving (Kienhues et al., 2016). In this context, Kienhues et al. (2016) differentiate between short-term and long-term interventions. While short-term interventions (e.g., Ferguson et al., 2013; Kienhues et al., 2008; Porsch & Bromme, 2011) usually focus on the exposure to (multiple) documents that point to inconsistencies between knowledge claims (e.g., internet documents or newspaper articles), long-term interventions (e.g., Brownlee, Petriwskyj, Thorpe, Stacey, & Gibson, 2011; Conley, Pintrich, Vekiri, & Harrison, 2004; Muis & Duffy, 2013) draw on “experiences of the knowledge building process, such as experiences within constructivist, hands-on science courses” (Kienhues et al., 2016, p. 323). A combination of both approaches, i.e., having students ‘do science’ in the context of an exposure to multiple conflicting documents, might be especially promising. Therefore, Rosman, Mayer, Peter, and Krampen (2016) had their participants read multiple conflicting texts (e.g., on methodological aspects of psychological studies), and complemented this reading task by “constructivist teaching techniques and direct instruction (Rosman, Mayer et al., 2016, p. 408). A key aspect of the intervention by Rosman, Mayer et al. (2016) was that their texts specifically named the conditions that certain findings occurred in, which would then allow students to conclude that inconsistencies between the corresponding studies can be ascribed to specific contextual factors. Theoretically, in terms of Bendixen (2002) model, this idea of making controversies ‘resolvable’ is particularly promising since it tackles both absolutism and multiplism simultaneously: At first, the presentation of diverging evidence leads to epistemic doubt regarding absolutism (Ferguson et al., 2012; Kienhues et al., 2016). Moreover, if students recognize that contextual factors (moderators) may explain inconsistent findings (i.e., that controversies can be dealt with), this, in terms of Bendixen's (2002) model, evokes epistemic doubt regarding multiplism. Finally, adequately dealing with diverging information also implies integrating and weighing evidence, which is a central component of evaluativism. Based on these theoretical considerations, it can be expected that ‘resolvable controversies’ interventions lead to a reduction of absolutism and multiplism as well as to an increase in evaluativism. 1.3. The role of writing tasks While Rosman, Mayer et al. (2016) experimentally showed that both absolute and multiplistic beliefs diminished as a consequence of the intervention, the multitude of methods employed makes it impossible to single out what exactly led to the observed changes in epistemic beliefs. In fact, changes might have been caused by the reading itself, the moderated discussions, the direct instruction, or by a combination of these components. Therefore, building upon Rosman et al.’s (2016) as well as on Rosman and Mayer (2018) ideas, Kerwer and Rosman's (2018a) exposed their participants to conflicting study abstracts on gender discrimination in schools (‘resolvable controversies’), but replaced the direct instruction as well as the discussion parts by writing tasks that required participants to write a scientific essay based on these abstracts. More specifically, participants were invited to write a text focusing on the conditions under which gender discrimination comes about, which they view as “reflecting on diverging information” (Kerwer & Rosman, 2018a, p. 18). With regard to Bendixen's (2002) model, such writing tasks may address yet another important process-related aspect of epistemic change: To overcome their epistemic doubt through belief change, individuals need to refer to certain resolution strategies such as reflection or social interaction (Bendixen, 2002; Rule & Bendixen, 2010). Reflection might thus play a moderating role in the effects of diverging information on epistemic change. Therefore, paying close attention to

1.2. Diverging information and epistemic change According to the Integrative Model of Personal Epistemology Development (Bendixen, 2002; Rule & Bendixen, 2010), epistemic doubt constitutes a driving force for epistemic change (Muis et al., 2015). Epistemic doubt is defined as a state of cognitive incongruity that arises when individuals recognize a dissonance between their existing beliefs and new experiences (Muis et al., 2015). Imagine Tara, a psychology student who believes that the current knowledge on a certain theory (‘Theory X’) is reliable and true (e.g., because it has been tested numerous times). When confronted with findings contradicting Theory X, epistemic doubt arises and inclines Tara to resolve the cognitive incongruity through epistemic change. For example, she might adopt the evaluativistic belief that Theory X works well under certain conditions and in certain contexts, but that further research is necessary in other contexts. In line with a view of epistemic doubt as a catalyst for epistemic change, many epistemic belief interventions focus – primarily in an effort to reduce absolute beliefs – on the presentation of diverging 86

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

the interactions between textual materials and individual behavior is central when designing epistemic belief interventions. In fact, manipulating how students reflect on diverging information by means of different writing task instructions might not only boost epistemic change in a quick and efficient manner, it might even be suited to single out the importance of specific components of Bendixen's (2002) model. While writing tasks are not too common in research on epistemic change1, they are more frequently employed in the field of multiple text comprehension. For example, to assess if and how students spontaneously integrate information sources, Barzilai and Eshet-Alkalai (2015) used an integrative question that required participants to integrate multiple controversial texts about seawater desalination. Furthermore, Bråten and Strømsø (2009) showed that learning outcomes depend on the kind of writing task instruction that is given: By means of an instruction to construct justified arguments based on seven conflicting texts, their participants developed a deeper and more integrated understanding of text contents in contrast to an instruction to produce a general overview. Adapting this approach with regard to epistemic change, one might instruct students to write an integrated text that specifically focuses on identifying moderating variables by analyzing the context that a certain finding occurs in (resolution instruction). This instruction might then be contrasted with an instruction to produce a more general overview or with an instruction to write a one-sided (i.e., unbalanced) argumentative essay, in its effects on epistemic change. Such an approach might be especially promising when using resolvable controversies (as a special form of diverging information, cf. Kerwer & Rosman, 2018a) since an integrative instruction facilitates the resolution of the controversies, whereas an instruction to write a general overview or a one-sided argumentative essay should have opposing effects (Rosman & Mayer, 2018). We therefore expect stronger intervention effects when participants are subjected to a resolution instruction, and we also expect that discarding the writing task reduces the intervention’s effects.

p. 640). Since the design of resolvable controversies requires researchers to focus on one specific topic, using a topic-specific measure is probably best suited to investigate instructional effects. However, topicspecific measures also have disadvantages. For example, Mayer and Rosman (2016) raise the concern that with increasingly specific conceptualizations of epistemic beliefs, the generalization of findings to other areas becomes problematic. We therefore argue that to adequately evaluate a topic-specific intervention, not only its effects on topic-specific beliefs, but also the generalization of these effects to domain-specific beliefs should be tested. In this regard, we expect that an intervention focusing on a topic that is representative for the discipline it is grounded in leads to a certain amount of spillover from topicspecific to domain-specific epistemic change. Only few corresponding studies have been published yet. This is striking since the suggested approach also allows verifying the TIDE framework’s assumption of a reciprocal influence (Muis et al., 2006) between different levels of epistemic beliefs and the instructional context (in our case, an epistemic belief intervention). In line with the predictions of the TIDE framework and since positive associations between topic-specific and more general epistemic beliefs have been shown empirically (Merk, Schneider, Syring, & Bohl, 2016; Trautwein, Lüdtke, & Beyer, 2004), we suggest that changes in topic-specific and domain-specific epistemic beliefs occur, to a certain extent, in parallel. 2. Hypotheses In line with our reasoning above, we suggest the following hypotheses: Hypothesis 1. Epistemic belief interventions drawing on ‘resolvable controversies’ in combination with writing tasks lead to reductions in students’ absolute and multiplistic beliefs as well as to an increase in evaluativistic beliefs (H1a). Similarly, we expect decreases in justification by authority and personal justification as well as an increase in justification by multiple sources (H1b).

1.4. Intervention effects on different levels of epistemic beliefs

Hypothesis 2. Intervention effects will be stronger for students who write an integrative text (resolution instruction) than for students writing a summary or a one-sided (i.e., unbalanced) argumentative essay.

On an outcome level, epistemic belief interventions might affect different aspects of epistemic beliefs. In fact, epistemic beliefs have been shown to vary with regard to their specificity. Earlier research (e.g., the work by Marlene Schommer) mostly adopted a domain-general approach, suggesting that individuals have similar beliefs across different domains (Schommer & Walker, 1995; Schommer, 1993). In an influential article from 2002, Buehl, Alexander, and Murphy challenged this assumption by arguing that knowledge structures may vary considerably between disciplines (e.g., between mathematics and history). They therefore suggest that epistemic beliefs are both domain-general and domain-specific and should thus be conceptualized as multilayered. In their Theory of Integrated Domains in Epistemology (TIDE), Muis and colleagues (e.g., Muis, Bendixen, & Haerle, 2006; Muis, Trevors, Duffy, Ranellucci, & Foy, 2016) further develop this idea. In particular, they suggest that (1) different levels of epistemic beliefs reciprocally influence each other, (2) academic and domain-specific epistemic beliefs become more dominant throughout students’ academic careers, and that (3) domain-specific beliefs are shaped by the students’ instructional environment. Recently, the TIDE framework has been extended with regard to topic-specific beliefs (Merk, Rosman, Muis, Kelava, & Bohl, 2018). This extension is based on the idea that “personal epistemology at different levels of specificity may have strongest impact on facets of academic learning at comparable levels of specificity” (Bråten & Strømsø, 2010,

Hypothesis 3. Intervention-induced changes in topic-specific epistemic beliefs will be positively associated with changes in domain-specific beliefs. Hypothesis 4. Intervention effects will be stronger for students who write an integrative text (resolution instruction) compared to a condition where no writing takes place. 3. Study 1 3.1. Method 3.1.1. Development of intervention materials Psychology students run a particular risk of developing high multiplistic beliefs (Peter et al., 2016). Therefore, the present studies focus on gender stereotyping by secondary school teachers as intervention topic – a controversial socio-scientific issue from educational psychology of which psychology students usually do not have too much prior knowledge, but that they are likely to find particular interest in (see also Rosman & Mayer, 2018). The topic of gender stereotyping also fulfills a central requirement for the design of resolvable controversies: The respective literature contains a lot of competing claims of which several are well integrable when closely scrutinizing the studies in question (e.g., Rieske, 2011). For example, regarding study grades, subject matter constitutes an important moderator: Whereas teachers’ gender stereotypes lead to boys receiving poorer grades in languages

1 Notable exceptions are the study by Kerwer and Rosman (2018a), a study by Barzilai & Ka’adan, 2017), and some earlier research by Brownlee and colleagues (e.g., Brownlee, 2003; Brownlee et al., 2001) who used learning journals to make students reflect on their epistemic beliefs.

87

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

Mertes et al. (2014) had 224 German teachers grade essays from secondary school students. Allegedly, the essays were written either by boys or girls. Even though all essays had been written by the researchers themselves, essays allegedly written by boys received significantly lower grades than those allegedly written by girls. In a study by Meier et al. (2015), 250 physics teachers graded physics tests. Even though all tests had been completed by boys, half of the tests were tagged with girls’ names. Independently of teachers’ sex, tests allegedly completed by girls received significantly lower grades than tests allegedly completed by boys. Feldmann et al. (2016) instructed 240 history teachers to grade history exams (secondary school level). All exams were originally written by girls. However, the researchers tagged half of the exams with boys’ names. Exams that were allegedly written by boys were neither graded better nor worse than those allegedly written by girls. Fig. 1. Three sample texts (cues allowing to resolve the controversies are underlined; Rosman & Mayer, 2018).

and literature (e.g., German language education; Maaz, Baeriswyl, & Trautwein, 2011), girls are more likely to receive poorer grades in natural sciences (e.g., physics; Hofer, 2015). Other moderating factors are ability type (i.e., teachers underestimate girls’ intelligence and boys’ writing abilities; Maaz et al., 2011) and type of behaviour (i.e., teachers more frequently ignore girls with regard to help-seeking behaviour, whereas, concerning disruptive behaviour, they more harshly discipline boys; Beaman, Wheldall, & Kemp, 2006; Rieske, 2011). Reading task. Based on the idea that neither boys nor girls are generally disadvantaged by their teachers but that it depends on such moderating factors, we developed 18 short texts (see also Rosman & Mayer, 2018). For each of the three moderating factors, 6 texts describing fictitious studies were formulated, suggesting that either boys (2 texts), girls (2 texts), or neither of them (2 texts) are being discriminated against. Each text includes specific cues that allow students to resolve the controversies. For example, a study suggesting that girls receive poorer grades included a cue that physics grades were assessed (see Fig. 1). To avoid confusing our participants, the texts are designed in a simplified form: Only one cue per text is included, and other study aspects that may allow a resolution of controversies (e.g., variations in study quality) are kept constant. This is to ensure that the epistemic context remains as invariant as possible over the different texts, so that students’ recognition of the cues is not hampered by the complexity of the materials. In addition to the 18 texts presenting ‘resolvable controversies’, 9 texts containing information irrelevant to teachers’ gender stereotypes are presented (e.g., stating that in their adolescence, girls are more likely to suffer from depression than boys). These filler snippets aim at increasing students’ attention for the task since their inclusion requires them to differentiate between relevant and irrelevant studies for the topic in question (see also Rosman & Mayer, 2018). To foster deep processing, each text is followed by a short adjunct question (Hamaker, 1986). In a forced-choice format, students are required to rate, separately for each text, whether it generally suggests that (a) boys, (b) girls, or (c) neither of them are being discriminated against, or (d) that the text is irrelevant for the topic of gender discrimination. Writing task. We expect that these ‘resolvable controversies’ provide a fruitful training ground for the development of more advanced epistemic beliefs. However, as outlined above, it is equally important that students engage with such materials in a nuanced and reflected way. Following the reading task, students are therefore asked to write a 400word essay. We developed three different writing task instructions. A resolution instruction requires students to write, based on the texts from

the reading task, a nuanced scientific essay. Students are explicitly invited to focus on the conditions under which boys respectively girls are discriminated against, which evokes reflection about the materials and facilitates resolving the controversies. A summary instruction, in contrast, requires writing a detailed overview over the different empirical findings. The instruction explicitly states that students should not evaluate the findings on whether boys or girls are discriminated against. Hence, students’ attention is focused onto the diverging findings as such and not on their resolution. Finally, a one-sided-argument instruction requires participants to choose a position (e.g., boys are being discriminated against), and then to write – based on the texts – an essay convincingly justifying their position. Students are instructed to only include findings supporting their position. Therefore, only a subsample of studies is to be selected and processed during the writing task (even though all 27 text snippets are to be read earlier), which obviously constrains the identification of moderators and the resolution of the controversies by integrating controversial evidence (see also Rosman & Mayer, 2018). Intervention procedure. The intervention uses a mix of online (survey software Unipark™) and paper-pencil materials. First, a short welcome page is presented on a computer screen. To reduce social desirability bias, participants are not made aware that they will participate in an intervention, and are only told that the study includes a variety of tasks and questionnaires. The next online page prompts participants to switch to the paper-pencil materials that had been distributed earlier. These materials include specific instructions on how to proceed as well as the text snippets and adjunct questions (‘reading task’). Upon reading each text snippet, the respective adjunct question is to be answered. After finishing this task, participants continue with the online materials, where they navigate to a page containing the task instructions. Thereby, the ‘original’ intervention includes the resolution instruction presented above, which however can be replaced by a summary or a one-sidedargument instruction – depending on the research question. For the writing task, participants use a Microsoft Word template and are allowed to re-read the text snippets using the paper-pencil materials. A time limit of 45 min is imposed on the writing task. After finalizing the intervention materials, a field-experimental study was conducted to obtain an initial evaluation of the intervention’s effects on epistemic beliefs. In line with its strong focus on intervention efficacy, Study 1 aimed at testing Hypothesis 1 (overall intervention efficacy) and Hypothesis 2 (effects of the writing task instructions), while Hypothesis 3 (associations between changes in domain- and 88

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

topic-specific beliefs) and Hypothesis 4 (incremental effects of writing over reading) would only be tested in Study 2 and Study 3.

factors was 68% (see Table 2). Reliability was sufficient to good for all three scales, with ω = 0.74 for justification by authority, ω = 0.70 for personal justification, and ω = 0.82 for justification by multiple sources.

3.1.2. Participants and procedure Study 1,2 with its field-experimental nature, drew on a 3 (intervention groups: resolution instruction vs. summary instruction vs. onesided-argument instruktion) × 2 (measurement points) pre-post design. Participants were N = 86 psychology undergraduates in their sixth semester, who were M = 22.67 (SD = 2.21) years old. Gender distribution was significantly skewed (74 females and 12 males), which is nevertheless typical for German psychology students (Abele-Brehm, 2017). Participants were recruited through an existing mailing list since they had already participated in a longitudinal study3 carried out by our institution during their first three semesters (Rosman et al., 2017). Pretest data were collected in an at-home module (online format) over the two weeks prior to the intervention. The interventions and posttest data collections were conducted in a computer lab in groups of 2–17 participants. Posttest data were collected at the end of the session.

3.1.4. Statistical analyses Since all three hypotheses relate to changes within individuals over time, we used latent difference score modeling (McArdle, 2009) for all analyses. The central idea of latent difference score modeling is to express a change of a variable y over two (or more) consecutive measurements as a constrained autoregression yt2,i = 1 · yt1,i + ζi and then extend this autoregression to a specifically constrained structural equation model (see Fig. 2). In such a model, the freely estimated intercept of Δη becomes interpretable as the average change of y between the two time points, because the paths on y2 are constrained to 1 and the residual variance of this variable is constrained to 0 (McArdle, 2009).4 This allows a parsimonious and efficient testing of Hypothesis 1. Moreover, the proposed model of change is particularly useful for our purposes because it can be extended to test hypotheses which go beyond questions of pure change: To test Hypothesis 2, we analyze predictive effects of discrete variables (i.e., experimental condition) on Δη, which can be specified using dummy coding (see Fig. 3).

3.1.3. Measures Topic-specific epistemic beliefs were measured with an instrument derived from the German FREE questionnaire (Krettenauer, 2005; Trautwein & Lüdtke, 2008). The new questionnaire (FREE-GST) begins with the presentation of three controversial positions on the nature and extent of gender-stereotype discrimination in secondary schools. Subsequently, 15 statements that relate to either absolute, multiplistic, or evaluativistic beliefs regarding the controversy are to be rated on a 6point Likert scale (see Table 1). For example, high agreement to ‘The future will show which position is definitely correct’ indicates absolutism. Dimensionality of the FREE-GST was investigated through principal component analysis with oblimin rotation (pretest stage; N = 86; k = 15). Scree plot examinations and parallel analysis (Horn, 1965) of resampled as well as simulated data revealed a three-factor solution to best fit the data. The total amount of variance explained by these factors was 54% (see Table 1). Reliability was assessed using McDonald’s ω, which is a better estimate than Cronbach’s alpha (Revelle & Zinbarg, 2009), but nevertheless can be interpreted similarly (i.e., using common rules of thumb; Kline, 1999). For the absolutism scale, reliability was good with ω = 0.82; for multiplism, it was rather low with ω = 0.66, and for evaluativism, it was sufficient with ω = 0.75. Domain-specific justification beliefs were assessed by a domain-specific adaptation of a German questionnaire (Klopp & Stark, 2016) that is based on prior measures (e.g., Ferguson et al., 2013). Each justification dimension was measured by three items using a 6-point Likert-scale (e.g., ‘To be able to trust knowledge claims in psychology, various knowledge sources have to be checked’ indicates justification by multiple sources). Dimensionality of the justification belief inventory was again investigated through principal component analysis. Scree plot investigations and parallel analysis revealed a three-factor solution to best fit the data. The total amount of variance explained by the three

3.2. Results Table 3 shows means, standard deviations, and intercorrelations of all variables analyzed in Study 1. 3.2.1. Manipulation check Text lengths ranged from 255 to 777 words, averaging at M = 461.05 (SD = 106.10). We found no outliers applying a criterion of z = 3.29 (Tabachnick & Fidell, 2000) and thus considered all texts for further analyses. By means of a one-way analysis of variance (independent variable: experimental condition; dependent variable: text length), we found no statistically significant group differences with regard to text length. To test instruction compliance, we investigated students’ in-text citations of the reading task studies and the terminology used in their essays. All text snippets included unique references to researchers who allegedly conducted the studies (e.g., “Feldmann et al., 2016”; see Fig. 1). Therefore, in a first step, a student assistant coded, for each of the 86 texts, the number of citations of studies suggesting that either boys or girls or neither of them would be discriminated against. An indication of the name of the respective researcher was necessary to increase the citation count; a mere reference to specific research results was not counted as a citation. Thirty texts were coded by a second student assistant. Both assistants were blind to all three hypotheses by the time of coding. Interrater reliability was excellent with an average Cohen’s Kappa of κ = 0.92. We therefore used the ratings of the first coder for all further calculations. An analysis of variance showed that the number of citations (excluding studies irrelevant to teachers’ gender stereotypes) significantly differed among the experimental conditions (F2;83 = 59.63; p < .001). Tukey’s HSD post-hoc tests revealed that the total number of citations was highest in the summary instruction condition (M = 13.00; SD = 4.05), which conformed to our expectations since students were requested to write a comprehensive overview of findings. A significantly lower number of citations emerged in the resolution instruction condition (M = 9.93; SD = 2.70), and the lowest number of citations was found in the one-sided-argument instruction condition (M = 4.93; SD = 1.03). This again is not surprising since our

2 Study 1 uses the same data as another study from our research group (Rosman et al., 2017). The overlap between both manuscripts is that both (1) use the FREE-GST as dependent measure, and (2) employ the same experimental procedure. The scopes of both manuscripts, however, differ significantly, since Rosman et al. (2017) investigate the effects of epistemic beliefs on epistemic emotions depending on how students approach conflicting evidence, whereas the present article is about epistemic change. 3 In the denoted longitudinal study, participants’ psychology-specific epistemic beliefs (absolutism and multiplism) were assessed four times using an instrument by Peter et al. (2016). The study’s nature was purely descriptive in the sense that no experimental manipulations or interventions were included. Hence, the only overlap between both studies is the fact that students’ epistemic beliefs were assessed. Moreover, the longitudinal study ended one year before the present data collection, which is why we expect no effects on data quality.

4 Since the change score models assume time-invariant measurement models of the variable in question and because of our reduced sample size, we used sum scores (y1, y2) instead of factor scores or a fully latent model, so that the term ‘latent’ only applies to the change score variable (Δη).

89

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

Table 1 Three-factor solution for topic specific epistemic beliefs (FREE-GST).

FREE-GST-A01 Either there is gender discrimination or there is not. In future, researchers should unequivocally clarify whether or not students are discriminated against because of gender stereotypes FREE-GST-A02 After so much research, it should actually be clear to decide which of the three views is correct FREE-GST-A03 If one would collect all the facts and let independent experts investigate them, it would surely be possible to decide which of the three views is correct FREE-GST-A04 The future will show which position is definitely correct. FREE-GST-A05 Obviously, some researchers know whether or not there is gender specific discrimination in German schools, while others do not FREE-GST-M01 All three views are shaped by personal opinions. Depending on their own attitude towards gender specific discrimination, researchers will support either one or the other view FREE-GST-M02 Scientists interpret their findings based on their personal opinion. Actually, nobody can know for sure whether or not there is gender stereotype discrimination in German schools FREE-GST-M03 Such statements are merely assumptions. Gender specific discrimination is determined by a multitude of aspects which cannot be compared to each other FREE-GST-M04 Such discrepancies are a good example for the fact that researchers interpret their data to match their own opinion FREE-GST-M05 Most scientific studies can be interpreted in very different ways FREE-GST-E01 Different researchers may have completely different opinions. Nevertheless, they can help us to better understand the effects of gender stereotypes in German schools FREE-GST-E02 Even though one may never say which position is definitely correct, some researchers can have better explanations than others FREE-GST-E03 There are pros and cons for all three views, but that does not mean that they are all equally wellfounded FREE-GST-E04 The three views are likely referring to different aspects of gender specific discrimination. Depending on the aspect in question, one or the other view is applicable FREE-GST-E05 Gender specific discrimination can be diverse. Accordingly, depending on certain contextual factors, rather one or the other view is correct Explained variance (%) McDonalds ω [95% CI]

Absolutism

Multiplism

Evaluativism

0.75

−0.01

0.12

0.71 0.76

−0.05 −0.16

−0.31 0.09

0.74 0.71

0.04 0.11

0.10 −0.16

−0.06

0.69

0.13

−0.27

0.72

−0.17

−0.21

0.50

0.29

0.32

0.68

0.05

0.18 0.01

0.51 −0.18

0.06 0.72

−0.05

0.23

0.70

0.24

0.13

0.78

−0.27

−0.19

0.48

−0.27

−0.06

0.69

21 0.82 [0.75, 0.88]

14 0.66 [0.53, 0.79]

18 0.75 [0.67, 0.83]

Note. Oblimin rotation. Boldface signals factor membership. Table 2 Three-factor solution for domain-specific epistemic beliefs (justification beliefs).

JUST-JA_01 JUST-JA_02 JUST-JA_03 JUST-PJ_01 JUST-PJ_02 JUST-PJ_03 JUST-MS_01 JUST-MS_02 JUST-MS_03 Explained variance (%) McDonalds ω [95% CI]

Justification by authority

Personal justification

Justification by multiple sources

0.86 0.83 0.69 −0.02 0.07 −0.11 −0.07 0.00 0.12 22

−0.09 0.08 −0.02 0.78 0.79 0.79 −0.09 0.07 0.05 21

0.07 0.08 −0.32 0.05 0.03 −0.08 0.90 0.86 0.78 25

0.74 [0.64, 0.84]

0.70 [0.59, 0.82]

0.82 [0.75, 0.89]

Note. Oblimin rotation. Boldface signals factor membership.

Fig. 2. Latent difference score model including the average change (Δη) between the two time points y1 and y2.

participants, who were required to choose a specific position, would only cite studies that are in line with this position. Moreover, in the onesided-argument instruction condition, students cited considerably more studies that conformed to their position (M = 4.93; SD = 1.03) than studies that contradicted their position (M = 0.07; SD = 0.26; t28 = 23.92; p < .001). In sum, these analyses provide some initial evidence for the suitability of our experimental manipulation. To further test instruction compliance, we evaluated whether students had integrated the controversial information from the snippets. To do so, we investigated three aspects of the terminology students used in their essays: (1) the number of words indicating a focus on contextual factors (e.g., ‘conditions’), (2) the number of explicitly named moderators (e.g., ‘subject matter’), and the number of explicitly mentioned cues related to these moderators (e.g., ‘physics’). Again, a student assistant coded all texts, and a second student assistant additionally coded one third of the texts to compute interrater reliability. With an average

Cohen’s Kappa of κ = 0.72, interrater reliability was substantial, which is why we again used ratings of the first coder for all further calculations. Together with Tukey’s HSD post-hoc tests, analyses of variance revealed that the terminology used by our participants indeed varied in a statistically significant way: The resolution instruction group used significantly more words relating to contextual factors (F2;83 = 17.64; p < .001) and named significantly more moderators (F2;83 = 6.16; p < .01) as well as cues pertaining to these moderators (F2;83 = 7.65; p < .001). These results show that participants receiving a resolution instruction indeed processed the text contents in a more integrative manner, providing further support for the suitability of our experimental manipulation (see also Rosman & Mayer, 2018).

90

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

(FREE-GST) as well as, on a domain-specific level, regarding justification by multiple sources. 3.2.3. Hypothesis 2 Hypothesis 2 predicts that the intervention effects will be stronger in the resolution instruction condition than in the two other conditions. We tested this prediction using our model with dummy coded group membership indicator variables (see Section 4.2.2 and Fig. 3). However, no differences between the intervention groups were found, neither for absolutism, multiplism, or evaluativism, nor for the justification variables. Hypothesis 2 is not supported (see Fig. 5 and Table 5). 3.3. Discussion 3.3.1. Hypothesis 1 Regarding topic-specific epistemic beliefs, Study 1 revealed that topic-specific absolute beliefs decreased over the course of the intervention. This replicates previous findings on the presentation of diverging information: If students are confronted with conflicting and inconsistent information on a certain topic, they will see knowledge on this topic as less fixed and absolute (Kienhues et al., 2008; Porsch & Bromme, 2011). Regarding topic-specific multiplism, however, no significant effects were found. While this might be somehow related to the smaller sample size and our rather specific sample, another explanation is that some of our participants might have endorsed multiplism since they were confronted with diverging information. Other participants might have realized more quickly that the inconsistencies can be resolved, entailing a reduction in multiplism. In other words, the presentation of resolvable controversies might have had opposing effects on multiplism, which, in sum, cancel each other out. Finally, topicspecific evaluativism significantly increased from pre- to posttest. This shows that the combination of resolvable controversies and writing tasks indeed has an effect on students’ epistemic beliefs that goes beyond a reduction in absolutism and a possible increase in multiplism. In our opinion, this is a key finding since it shows that efforts to foster evaluativism do not necessarily imply a simultaneous increase in multiplism – which is good when bearing in mind that multiplism has recently been shown to impair learning (Barzilai & Eshet-Alkalai, 2015; Bråten, Ferguson, Strømsø, & Anmarkrud, 2013; Rosman, Peter, Mayer, & Krampen, 2016). Regarding domain-specific justification beliefs, results were less consistent. While justification by multiple sources increased as expected, no effects regarding justification by authority or personal justification were found. One possible explanation for this absence of effects is that, just like we argued in the paragraph above, the intervention might have had opposing effects on personal justification. Furthermore, since justification beliefs were measured on a rather

Fig. 3. Latent difference score model including the average change (Δη) between the two time points y1 and y2 as well as dummy-coded group membership.

3.2.2. Hypothesis 1 As outlined above, we used latent difference score modeling to investigate differences in students’ epistemic beliefs between pre- and posttest (Hypothesis 1). Conforming to our expectations, we found a significant decrease (α2 = −0.43; 95% CI [−0.58, −0.27]; p < .001; for nomenclature see Fig. 2) on the absolutism scale of the FREE-GST (topic-specific epistemic beliefs), whereas evaluativistic beliefs significantly increased from pre- to posttest (α2 = 0.37; 95% CI [0.26, 0.47]; p < .001). No statistically significant effects were found with regard to topic-specific multiplism (α2 = −0.05; 95% CI [−0.18, 0.08]; p = ns). Concerning domain-specific justification beliefs, we found a statistically significant increase on justification by multiple sources (α2 = 0.19; 95% CI [0.04, 0.33]; p < .05), whereas we found no effects on the two other justification scales (personal justification: α2 = 0.05; 95% CI [−0.10, 0.19]; p = ns; justification by authority: α2 = 0.05; 95% CI [−0.11, 0.22]; p = ns). An overview over the findings is depicted in Fig. 4 (findings regarding Study 1 are in dark grey); details can be found in Table 4. In sum, Hypothesis 1 is supported regarding topic-specific absolutism and topic-specific evaluativism Table 3 Intercorrelations and reliabilities of all study variables (Study 1).

1 2 3 4 5 6 7 8 9 10 11 12

Topic-specific absolute beliefs (pretest) Topic-specific absolute beliefs (posttest) Topic-specific multiplistic beliefs (pretest) Topic-specific multiplistic beliefs (posttest) Topic-specific evaluativistic beliefs (pretest) Topic-specific evaluativistic beliefs (posttest) Justification by authority (pretest) Justification by authority (posttest) Personal justification (pretest) Personal justification (posttest) Justification by multiple sources (pretest) Justification by multiple sources (posttest)

M

SD

1

2

3

4

5

6

7

8

9

10

11

12

2.79 2.37 3.18 3.14 4.74 5.11 3.05 3.11 2.66 2.70 4.94 5.12

0.91 0.97 0.65 0.66 0.70 0.62 0.96 0.84 0.97 0.90 0.79 0.69

– 0.70** −0.09 −0.01 −0.26* −0.24* 0.56** 0.34** −0.03 0.06 −0.11 −0.22*

– −0.08 0.06 −0.43** −0.41** 0.39** 0.19 0.05 0.07 −0.00 −0.21

– 0.55** 0.25* 0.09 −0.13 −0.07 0.52** 0.46** 0.15 0.07

– 0.03 −0.04 −0.07 −0.05 0.56** 0.59** 0.08 0.03

– 0.72** −0.11 0.02 −0.07 −0.05 0.32** 0.26**

– −0.16 −0.04 −0.12 −0.12 0.28** 0.38**

– 0.62** −0.14 −0.08 −0.09 −0.17

– −0.20 −0.09 −0.15 −0.21*

– 0.73** 0.14 −0.08

– 0.13 −0.10

– 0.58**



Note. N = 86; M = arithmetic mean; SD = standard deviation. * p < .05. ** p < .01. 91

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

Fig. 4. Visualization of latent difference score model parameters for Hypothesis 1.

3.3.2. Hypothesis 2 Regarding Hypothesis 2, our idea was that with a resolution instruction, students would be inclined to deeper processing and construct more differentiated and integrated mental models of the texts. This, in turn, would allow an easier identification of moderators and thus boost the intervention’s effects on evaluativism. Empirically, our manipulation check provides ample evidence that the instructions worked as expected. However, our expectation that resolution instructions would lead to stronger intervention effects compared to summary or one-sided-argument instructions was not supported by our data. One possible explanation for this lack of significant effects is that reflective writing (i.e., the resolution instruction) might have smaller effects than expected, which is why our study might have been underpowered to detect differences between the experimental conditions. In fact, on a descriptive level, all effects are in the hypothesized direction, and while it is not possible to draw any conclusions from these non-significant findings, further investigating the issue in a larger sample might prove worthwhile. A second explanation relates to the study’s timeline: Students did not receive the task instructions until after reading all snippets. This was done to keep the experimental conditions as comparable as possible, i.e., to ensure that all participants would read the text snippets using the same ‘lens’. However, a disadvantage of this approach is that students from the control groups might have already reflected on the diverging information and resolved some of the controversies prior to the writing task. In this case, it would not be surprising that the effects of the task instructions produce lower effect sizes than one would generally expect. Taken together, both points (reduced sample size and the timeline of the study) might thus well explain the non-significance of Hypothesis 2, and call for closer investigation in a follow-up study.

Table 4 Changes in epistemic beliefs from pre- to posttest. Latent Difference Scores (α2/α2 + β1)

Absolute beliefs (topic-specific) Multiplistic beliefs (topicspecific) Evaluativistic beliefs (topicspecific) Justification by authority (domain-specific) Personal justification (domainspecific) Justification by multiple sources (domain-specific)

Study I

Study II

Study III

−0.43*** [−0.58, −0.27] −0.05 [−0.18, 0.08] 0.37*** [0.26, 0.47] 0.05 [−0.11, 0.22] 0.05 [−0.10, 0.19] 0.19* [0.04, 0.33]

−0.52*** [−0.64, −0.40] −0.39*** [−0.51, −0.28] 0.30*** [0.20, 0.39] −0.16* [−0.29, −0.02] 0.00 [−0.11, 0.12] 0.13* [0.02, 0.25]

−0.61*** [−0.79, −0.42] −0.16 [−0.37, 0.04] 0.26** [0.09, 0.43] −0.28* [−0.50, −0.06] −0.17 [−0.38, 0.03] 0.25** [0.09, 0.42]

Note. Values in parentheses are 95% confidence intervals. * p < .05. ** p < .01. *** p < .001.

abstract level (i.e., domain-specific), students might have felt inclined to take a side when confronted with a concrete and authentic dilemma (Barzilai & Zohar, 2012), which is why they might not have related their more ‘general’, domain-specific beliefs to the topic in question. In line with this, but on a more technical level, it is to be expected that intervention effects regarding a broader, domain-level construct (i.e., domain-specific beliefs) are smaller since the relationship between dependent and independent variables is strongest when they are operationalized at comparable levels of specificity (Bråten & Strømsø, 2010). Nevertheless, we expected a certain amount of spillover from changes in topic-specific to changes in domain-specific beliefs, which will be discussed later on (cf. Hypothesis 3).

4. Study 2 To further investigate the intervention’s effects, a second study was conducted around six months after Study 1. With regard to the growing 92

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

Fig. 5. Visualization of latent difference score model parameters for Hypothesis 2. Table 5 Differences in epistemic change between the experimental groups. Predictive Effect of Dummy Variable on ηy

Absolute beliefs (topic-specific) Multiplistic beliefs (topic-specific) Evaluativistic beliefs (topic-specific) Justification by authority (domain-specific) Personal justification (domain-specific) Justification by multiple sources (domain-specific)

Argument instruction

Summary instruction

0.15[−0.22, 0.51] −0.01[−0.34, 0.32] −0.27*[−0.53, −0.01] 0.20[−0.19, 0.58] −0.10[−0.43, 0.23] −0.11[−0.38, 0.17]

0.18[−0.19, 0.55] 0.18[−0.12, 0.47] −0.28*[−0.54, −0.01] 0.23[−0.09, 0.54] 0.08[−0.25, 0.42] 0.09[−0.24, 0.42]

Note. Reference category = resolution instruction condition; values in parentheses are 95% confidence intervals. * p < .05.

in a computer lab in groups of 4–28 participants.5 The intervention procedure was identical to Study 1. Posttest data were collected at the end of the session.

need for replication studies in educational research (Makel & Plucker, 2014), Study 2 aimed at testing Hypothesis 1 again in a larger and less specific sample. Moreover, Study 2 included an additional domainspecific version of the FREE questionnaire, which allowed testing Hypothesis 3 with two conceptually equivalent measures.

4.1.2. Measures To allow a complete replication of Hypothesis 1, Study 2 included the same set of measures as Study 1 (topic-specific beliefs: FREE-GST; domain-specific justification beliefs: justification inventory). For the FREE-GST, reliability was rather low (absolutism: ω = 0.63; multiplism: ω = 0.67; evaluativism: ω = 0.49), and for the justification inventory, it was satisfactory except for the justification by multiple sources dimension (justification by authority: ω = 0.76; personal justification: ω = 0.75; justification by multiple sources: ω = 0.55). Additionally, to test Hypothesis 3, a newly designed domain-specific adaptation of the FREE questionnaire (Krettenauer, 2005), the FREEEDPSY, was employed. This new questionnaire again uses a scenariobased format. In contrast to the FREE-GST, however, it does not begin with presenting three controversial positions on one specific topic (e.g.,

4.1. Method 4.1.1. Participants and procedure Since we expected the strongest intervention effects in the resolution instruction condition and to gain a sufficiently large sample for testing Hypothesis 3, we used a simple pre-post design with all students receiving the same writing task instruction (resolution instruction; see Section 2.2). Participants were N = 153 psychology undergraduates from all semesters. Eight participants had either missed pre- or posttest or had already participated in Study 1 and were therefore discarded from all further analyses. The remaining N = 145 participants were M = 21.56 (SD = 2.92) years old. Again, gender distribution was skewed (131 females and 14 males). Participants were recruited by means of flyers and a mailing list. Just like in Study 1, pretest data were collected in an at-home module over the two weeks prior to the intervention. The interventions and posttest data collections were conducted

5 In one instance, data were collected in an individual session (one single participant) because the other students from the group did not show up.

93

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

Table 6 Three-factor solution for domain-specific epistemic beliefs (FREE-EDPSY).

FREE-EDPSY-A01 Either the method facilitates learning or not. In future, researchers should unequivocally clarify which researcher is right FREE-EDPSY-A02 After so much research, it should actually be clear to decide which researcher is correct FREE-EDPSY-A03 If one would collect all the facts and let independent experts investigate them, it would surely be possible to decide which researcher is correct FREE-EDPSY-A04 The future will show which researcher is right FREE-EDPSY-A05 Obviously, one researchers knows what he is talking about, the other does not FREE-EDPSY-M01 In educational research, the interpretation of research results is shaped by personal opinions. Depending on their own attitude, researchers will support either one or the other view FREE-EDPSY-M02 In educational research, scientists interpret their findings based on their personal opinion. Actually, nobody can know for sure whether specific methods are beneficial for learning or not FREE-EDPSY-M03 Such statements are merely assumptions. Psychological phenomena are determined by a multitude of aspects which cannot be compared to each other FREE-EDPSY-M04 Such discrepancies are a good example for the fact that in educational research, scientists interpret their data to match their own opinion FREE-EDPSY-M05 In educational psychology, most studies can be interpreted in very different ways FREE-EDPSY-E01 In educational psychology, different researchers may have completely different opinions. Nevertheless, they can help us to better understand student learning FREE-EDPSY-E02 Even though one may never say which position is definitely correct, in educational psychology, some researchers can have better explanations than others FREE-EDPSY-E03 In educational psychology, there are often pros and cons for different views, but that does not mean that these are all equally well-founded FREE-EDPSY-E04 The three views are likely to be referring to different aspects of the method in question. Some aspects of the method might be beneficial for learning, while others impede learning FREE-EDPSY-E05 Psychological methods are influenced by the context they are employed in. Accordingly, depending on certain contextual factors, the method might sometimes be beneficial and sometimes not Explained variance (%) McDonalds ω [95% CI]

Absolutism

Multiplism

Evaluativism

0.78

−0.10

−0.04

0.84 0.72

−0.06 0.18

−0.01 0.08

0.64 0.35 −0.08

−0.06 0.29 0.74

0.05 −0.39 0.10

−0.06

0.77

−0.12

−0.12

0.52

0.13

0.06

0.78

−0.04

0.04 −0.10

0.71 −0.03

0.05 0.68

0.16

0.07

0.74

0.08

0.10

0.72

−0.02

−0.08

0.72

−0.16

−0.10

0.58

17 0.76 [0.70, 0.83]

18 0.75 [0.67, 0.82]

17 0.74 [0.67, 0.80]

Note. Oblimin rotation. Boldface signals factor membership.

on gender stereotyping), but on educational psychology in general (hence, allowing a domain-specific measurement). More specifically, the scenario vignette states that at a congress, researchers are arguing whether a certain – unspecified – method from educational psychology has beneficial effects on learning or not. Analogous to the FREE-GST, students are then required to rate 15 statements that relate to either absolute, multiplistic, or evaluativistic beliefs regarding the controversy on a 6-point Likert scale (for a sample item see Section 4.1.2). The wording of some statements was slightly adapted to fit the new scenario (i.e., by replacing content pertaining to the topic of gender stereotyping). To ensure comparability between topic-specific and domainspecific beliefs, we nevertheless took care that all items remained conceptually identical with regard to the FREE-GST. Dimensionality of the FREE-EDPSY was again investigated through principal component analysis with oblimin rotation (N = 145; k = 15). Scree plot examinations and parallel analysis (Horn, 1965) of resampled as well as simulated data revealed a three-factor solution to best fit the data. The total amount of variance explained by these factors was 52% (see Table 6). However, one item from the absolutism scale (FREE-EDPSY-A05; see Table 6) loaded simultaneously on both absolutism and multiplism. While we were tempted to exclude this item from further analyses, we decided not to do so since a rigorous test of Hypothesis 3 requires both inventories to be conceptually equivalent. Reliability was sufficient for all three scales, with ω = 0.76 for absolutism, ω = 0.75 for multiplism, and ω = 0.74 for evaluativism.

4.2. Results Table 7 shows means, standard deviations, and intercorrelations of all variables analyzed in Study 2. 4.2.1. Hypothesis 1 Regarding Hypothesis 1, all significant effects from Study 1 were replicated in Study 2. In addition (and in line with Hypothesis 1), significant decreases in topic-specific multiplism (α2 = −0.39; 95% CI [−0.51, −0.28]; p < .001) and domain-specific justification by authority were also found (α2 = −0.16; 95% CI [−0.29, −0.02]; p < .05; for an overview see Fig. 4; for details see Table 4). 4.2.2. Hypothesis 3 Hypothesis 2 was not tested in Study 2 due to its different design. Using data from Study 2, we however tested Hypothesis 3 by predicting the change scores of domain-specific epistemic beliefs (FREE-EDPSY) with the respective change scores of the topic-specific ones (FREE-GST; see Fig. 6). This resulted in three significant regression paths (absolutism: β = 0.42, p < .001, R2 = 0.153; multiplism: β = 0.46, p < .001, R2 = 0.176; evaluativism: β = 0.29, p < .001, R2 = 0.077) of moderate to large effect sizes (Cohen, 1988). 4.3. Discussion 4.3.1. Hypothesis 1 All findings regarding Hypothesis 1 were successfully replicated in Study 2. Given that it can be challenging to replicate psychological findings (Open Science Collaboration, 2015), this provides convincing evidence for the robustness of our intervention effects. In addition to the replication of all findings from Study 1, we also found significant effects – in the expected direction – on domain-specific justification by authority and on topic-specific multiplism. This supports our reasoning from above that Study 1 might have been underpowered to find smaller effects.

4.1.3. Statistical analyses Just as in Study 1, data from Study 2 were analyzed by means of latent difference score modeling (McArdle, 2009). Hypothesis 1 was retested using exactly the same code as in Study 1. Hypothesis 3, which investigates predictive effects of domain- and topic-specific epistemic change, was modeled by specifying the two changes simultaneously as a bivariate latent difference score model (e.g., Grimm, An, McArdle, Zonderman, & Resnick, 2012) and adding a latent regression path from one latent change score variable to the other (see Fig. 6). 94

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

5. Study 3 To explore the writing task’s efficacy in a more direct manner and to separate its effects from those of the reading task, we furthermore reanalyzed data that were recently collected by our working group (Kerwer & Rosman, 2018a). The full dataset is freely available as Kerwer and Rosman (2018b). Using a subset of these data, we aimed to (1) test Hypothesis 4 by investigating the intervention components’ efficacy from a different angle, and (2) to replicate findings of Study 1 and Study 2 concerning Hypothesis 1. Additionally, (3) we revisited Hypothesis 3 once more in order to replicate the effects found in Study 2. 5.1. Method 5.1.1. Participants and procedure The study by Kerwer and Rosman (2018a) employed a pre-postdesign with four intervention groups (N = 201). As two groups differed with regard to the administered reading task, we discarded these groups for Study 3 and, thus, proceeded only with N = 93 participants who had received the same reading task as participants in Study 1 and Study 2. One of the two remaining intervention groups received exactly the same intervention as in Study 2 (i.e., the reading task and the ‘resolution instruction’ writing task). For the other group, in contrast, the ‘resolution instruction’ writing task was discarded and posttest data were gathered directly after the reading task. Study 3 participants were psychology students (minor and major) from all semesters who were on average M = 23.27 (SD = 3.54) years old. The gender distribution (84 females and 9 males) resembled those observed in Study 1 and Study 2. Again, and just like in Study 2, we recruited participants by means of flyers and a mailing list, while all other study procedures did not differ from Study 1 and Study 2. Group sizes in the computer lab ranged from 1 to 29 participants in Study 3 (for more details, see Kerwer & Rosman, 2018a). 5.1.2. Measures In Study 3, we used exactly the same set of measures as in Study 2 (topic-specific beliefs: FREE-GST; domain-specific justification beliefs: justification inventory; domain-specific beliefs: FREE-EDPSY). For justification beliefs and the FREE-EDPSY, reliability was sufficient (absolutism: ω = 0.76; multiplism: ω = 0.69; evaluativism: ω = 0.70; justification by authority: ω = 0.76; personal justification: ω = 0.70; justification by multiple sources: ω = 0.71), while, for the FREE-GST, it was rather low (absolutism: ω = 0.66; multiplism: ω = 0.58; evaluativism: ω = 0.74) – at least for the absolutism and multiplism dimensions.

Fig. 6. Bivariate latent difference score model including the average change between the two time points for both topic-specific (Δηy) and domain-specific (Δηz) epistemic beliefs.

4.3.2. Hypothesis 3 While testing Hypothesis 1, effect sizes were considerably higher for topic-specific beliefs (i.e., the FREE-GST) than for domain-specific justification beliefs – both in Study 1 and in Study 2. Given that the complex and multilayered state of evidence in gender stereotyping might well be representative for psychological research as a whole, it nevertheless is not surprising that a certain amount of spillover from topic-specific to domain-specific belief changes occurred. In line with this, our results on Hypothesis 3 reveal that changes in topic- and domain-specific epistemic beliefs (on the FREE questionnaire) occur in parallel: If students’ topic-specific beliefs develop in a specific direction, their domain-specific beliefs undergo a similar development. This spillover effect is particularly interesting because it suggests a bottomup fashion in which students generalize changes in topic-specific beliefs onto their beliefs about psychological knowledge in general. Furthermore, justifying our choice of this topic, it indicates that students do not view gender stereotyping as an ‘exception to the rule’. Finally, it also provides evidence for the TIDE framework’s prediction that different levels of epistemic beliefs reciprocally influence each other (Merk et al., 2018; Muis et al., 2006, 2016).

5.1.3. Statistical analyses In line with Study 1 and Study 2, we used latent difference score modeling as a statistical technique for testing our hypotheses (McArdle, 2009). To disentangle the effects of the reading and writing tasks (Hypothesis 4), we modified the model we had employed for testing Hypothesis 2 (see Study 1 and Fig. 3) as follows: We specified the group that received only the reading task as the reference category (instead of the resolution instruction in Study 1) and predicted the latent change score by a dummy-coded variable which indicated if the writing task was administered or not (instead of indicating the kind of task instruction subjects received in Study 1). Hence, the intercept of the latent change score (α2) can be interpreted as the overall effect of the reading task. Furthermore, the regression coefficient of the dummycoded variable (β1) corresponds to the writing task’s incremental effect on epistemic change (Hypothesis 4), while the total effect in this group (Hypothesis 1) is the sum of both (α2 + β1). Finally, Hypothesis 3 was tested using exactly the same model used in Study 2 (see Fig. 6). 95

96

Topic-specific absolute beliefs (pretest) Topic-specific absolute beliefs (posttest) Topic-specific multiplistic beliefs (pretest) Topic-specific multiplistic beliefs (posttest) Topic-specific evaluativistic beliefs (pretest) Topic-specific evaluativistic beliefs (posttest) Justification by authority (pretest) Justification by authority (posttest) Personal justification (pretest) Personal justification (posttest) Justification by multiple sources (pretest) Justification by multiple sources (posttest) Domain-specific absolute beliefs (pretest) Domain-specific absolute beliefs (posttest) Domain-specific multiplistic beliefs (pretest) Domain-specific multiplistic beliefs (posttest) Domain-specific evaluativistic beliefs (pretest) Domain-specific evaluativistic beliefs (posttest) 5.07

2.71 2.19 3.26 2.86 4.76 5.05 3.43 3.27 2.35 2.35 4.96 5.09 2.66 2.25 3.03 2.89 4.86 0.62

0.68 0.76 0.72 0.85 0.55 0.52 0.90 0.97 0.86 0.90 0.68 0.68 0.71 0.75 0.74 0.81 0.61

SD

−0.03

– 0.49** 0.05 0.11 −0.23** 0.05 0.17* 0.12 0.22** 0.12 0.01 0.03 0.58** 0.41** 0.06 0.13 −0.16

1

Note. N = 86; M = arithmetic mean; SD = standard deviation. * p < .05. ** p < .01.

18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

M

Table 7 Intercorrelations and reliabilities of all study variables (Study 2).

– 0.58** 0.03 −0.10 −0.01 −0.07 0.52** 0.42** 0.16 0.13 0.07 −0.02 0.64** 0.52** −0.04 −0.25**

−0.24**

3

– 0.03 0.25** −0.28** −0.27** 0.23** 0.16* 0.13 0.21* 0.00 −0.07 0.54** 0.77** 0.10 0.21* −0.17*

2

−0.37**

– −0.21* −0.19* −0.03 −0.053 0.52** 0.72** −0.06 0.03 0.14 0.18* 0.59** 0.81** −0.22**

4

0.42**

– 0.42** −0.12 −0.033 −0.13 −0.20* 0.22** 0.10 −0.21* −0.16 −0.02 −0.12 0.58**

5

0.74**

– −0.05 0.045 −0.19* −0.28** 0.08 0.23** −0.04 −0.14 −0.10 −0.17* 0.53**

6

0.01

– 0.62** 0.11 0.10 −0.13 −0.22** 0.22** 0.17* −0.09 −0.03 −0.02

7

0.13

– −0.04 −0.03 −0.22** −0.18* 0.15 0.13 −0.09 −0.05 0.11

8

−0.20*

– 0.67** 0.19* 0.22** 0.17* 0.05 0.60** 0.55** −0.13

9

−0.34**

– 0.05 0.16 0.13 0.08 0.57** 0.75** −0.19*

10

0.16

– 0.43** 0.06 −0.01 0.08 0.04 0.19*

11

0.27**

– 0.12 0.05 0.09 0.15 0.18*

12

−0.08

– 0.60** 0.06 0.11 −0.19*

13

−0.15

– 0.03 0.14 −0.09

14

−0.23**

– 0.68** −0.09

15

−0.24**

– −0.09

16

0.63**



17



18

T. Rosman, et al.

Contemporary Educational Psychology 58 (2019) 85–101

Contemporary Educational Psychology 58 (2019) 85–101

– – 0.58** – 0.01 0.14

5.2. Results Table 8 shows means, standard deviations, and intercorrelations of all variables analyzed in Study 3. 5.2.1. Hypothesis 1 All but one significant effect found in Study 1 and Study 2 were replicated in the group that was subjected to both reading and writing (see Table 4 for details). The only exception was the effect on topicspecific multiplism which was significant in Study 2 but failed to reach statistical significance in Study 3 (and in Study 1).

– 0.08 0.16 −0.17 −0.16 – −0.01 −0.06 0.14 0.20 0.31** 0.21*

5.2.2. Hypothesis 3 Just like in Study 2, Hypothesis 2 was not tested in Study 3 as the writing task instructions were not experimentally varied. Regarding Hypothesis 3, all significant regression paths were replicated (absolutism: β = 0.43, p < .001, R2 = 0.160; multiplism: β = 0.46, p < .001, R2 = 0.266; evaluativism: β = 0.38, p < .001, R2 = 0.223). 5.2.3. Hypothesis 4 When testing Hypothesis 4, we found significantly stronger intervention effects in the group that received the writing task for topicspecific absolutism (β1 = −0.29; 95% CI [−0.53, −0.04]; p < .05) and domain-specific justification by authority (β1 = −0.33; 95% CI [−0.63, −0.04]; p < .05; for an overview see Fig. 7; for details see Table 9). To facilitate the interpretation of these effects, we divided them by the corresponding variables’ observed pre-intervention standard deviations, thereby obtaining small to moderate standardized effects for topic-specific absolutism (0.40) and for domain-specific justification by authority (0.38). Interpretation of results concerning Hypothesis 4 is however complicated by the fact that we observed more differences regarding the significance of overall effects in the respective groups, which yet failed to yield significant β1 regression coefficients. For example, overall effects for topic-specific evaluativism and domain-specific beliefs in justification by authority were not significant in the ‘reading task only’ group (see Table 9) while we observed significant overall effects when participants were subjected to both reading and writing (see also our results concerning Hypothesis 1 and Table 4). On the other hand, topicspecific multiplism did not decrease significantly for participants that were subjected to both reading and writing (α2 + β1 = −0.16; 95% CI [−0.37, 0.04]; p = ns) while this decrease surprisingly reached significance in the group that received the reading task only (α2 = −0.29; 95% CI [−0.48, −0.10]; p < .01). 5.3. Discussion Note. N = 93; M = arithmetic mean; SD = standard deviation. * p < .05. ** p < .01.

– 0.03 0.15 −0.2 −0.02 0.06 0.13 0.10 0.04 0.03 −0.03 0.56** 0.77** 0.01 0.05 −0.19 −0.11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Topic-specific absolute beliefs (pretest) Topic-specific absolute beliefs (posttest) Topic-specific multiplistic beliefs (pretest) Topic-specific multiplistic beliefs (posttest) Topic-specific evaluativistic beliefs (pretest) Topic-specific evaluativistic beliefs (posttest) Personal justification (pretest) Personal ustification (posttest) Justification by authority (pretest) Justification by authority (posttest) Justification by multiple sources (pretest) Justification by multiple sources (posttest) Domain-specific absolute beliefs (pretest) Domain-specific absolute beliefs (posttest) Domain-specific multiplistic beliefs (pretest) Domain-specific multiplistic beliefs (posttest) Domain-specific evaluativistic beliefs (pretest) Domain-specific evaluativistic beliefs (posttest)

2.83 2.37 3.11 2.88 4.78 4.98 2.66 2.49 3.48 3.37 4.99 5.23 2.88 2.57 3.19 2.96 4.90 5.05

0.71 0.81 0.66 0.80 0.69 0.63 0.87 0.93 0.86 0.98 0.77 0.60 0.75 0.82 0.72 0.81 0.63 0.56

– 0.64** −0.01 0.19 −0.24* 0.01 0.12 0.14 0.18 −0.07 −0.01 0.07 0.69** 0.59** 0.03 0.12 −0.18 −0.06

– 0.54** −0.13 −0.05 0.64** 0.54** −0.04 −0.09 0.08 0.16 −0.01 0.08 0.66** 0.50** −0.05 −0.22*

– −0.09 0.20 0.50** 0.65** 0.02 −0.06 0.07 0.20 0.10 0.13 0.60** 0.81** −0.02 0.06

– 0.53** −0.22* −0.27** −0.03 0.08 0.26* 0.17 −0.22* −0.23* −0.13 −0.12 0.77** 0.51**

– −0.02 0.03 0.02 −0.01 0.26* 0.35** 0.03 −0.10 0.08 0.22* 0.54** 0.78**

– 0.67** 0.00 −0.18 0.05 0.07 0.17 0.19 0.72** 0.54** −0.21* −0.17

– −0.11 −0.14 0.09 0.19 0.12 0.20 0.71** 0.71** −0.24* −0.15

– 0.66** −0.16 −0.21* 0.30** 0.26* −0.22* 0.02 −0.05 0.08

– −0.12 −0.34** −0.05 0.17 −0.2 −0.03 0.05 0.11

– 0.49** 0.02 0.01 0.11 0.00 0.22* 0.05

– 0.58** −0.05 0.03 −0.27** −0.06

– 0.66** −0.07 −0.06

18 17 14 12 3 2 1 SD M

Table 8 Means, standard deviations, and intercorrelations of all study variables (Study 3).

4

5

6

7

8

9

10

11

13

15

16

T. Rosman, et al.

5.3.1. Hypothesis 1 Regarding Hypothesis 1, all findings from Study 1 and Study 2 were successfully replicated – with the exception of the effect on topic-specific multiplism, which was only significant in Study 2. Just like we argued before, this might be due to sample size issues in Study 1 and Study 3, which might have been underpowered to find – arguably smaller – effects of multiplism. In fact, Barzilai and Ka’adan (2017) showed that a combination of reading and writing alone might not be sufficient to reduce multiplism, and that further intervention components are necessary (e.g., additional scaffolds). In sum, nevertheless, the successful replications of almost all aspects of Hypothesis 1 can be regarded as convincing evidence for the positive effects of a combination of reading and writing about scientific controversies on epistemic change. 5.3.2. Hypothesis 3 We again found that changes in topic-specific epistemic beliefs are positively associated with changes in domain-specific beliefs. Just like in Study 2, these effects were significant for absolutism, multiplism, and 97

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

Fig. 7. Visualization of latent difference score model parameters for Hypothesis 4.

to interpret – this pattern of results seems to be in line with prior research: In their 2017 study, Barzilai and Ka’adan presented ninth graders with diverging information on health and nutrition and had them, among others, engage in an argument writing task. While their intervention led to a small but significant decline in absolutism, they found that a reduction in multiplism requires additional intervention components. In fact, multiplism only decreased in the groups receiving additional scaffolds, such as, for example, on source evaluation, but not in the group that solely engaged in the writing task (Barzilai and Ka’adan (2017)). This idea that multiplism is harder to tackle might well explain6 our non-significant findings, and also provides an explanation for our inconsistent findings regarding Hypothesis 1 (see above). With regard to evaluativism, finally, Barzilai and Ka’adan (2017) intervention revealed no effects at all. This finding, together with the fact that we found significant effects on evaluativism when reading and writing are combined (Hypothesis 1) but no significant additional effects of writing over reading (Hypothesis 4), suggests that effects on evaluativism might be due to the genuine effect of the ‘resolvability’ of our controversies. We concede that this interpretation is heavily speculative due to the different samples and since interpreting non-significant findings is difficult, and emphasize that there is a need for further research in this area. However, for now, Study 3 adds to the literature by providing proof that writing indeed has an incremental effect over reading alone and by quantifying this effect’s size.

Table 9 Differences in epistemic change between the experimental groups (Study 3). Regression coefficients on ηy

Absolute beliefs (topic-specific) Multiplistic beliefs (topic-specific) Evaluativistic beliefs (topic-specific) Justification by authority (domainspecific) Personal justification (domainspecific) Justification by multiple sources (domain-specific)

Intercept (Reading Only)

Resolution Instruction

−0.32*** [−0.48, −0.15] −0.29** [−0.48, −0.10] 0.14 [−0.03, 0.30] 0.05 [−0.15, 0.25] −0.16 [−0.36, 0.03] 0.23* [0.04, 0.42]

−0.29* [−0.53, 0.12 [−0.15, 0.12 [−0.09, −0.33* [−0.63, −0.01 [−0.29, 0.02 [−0.19,

−0.04] 0.40] 0.34] −0.04] 0.27] 0.23]

Note. Reference category = reading task only; values in parentheses are 95% confidence intervals. * p < .05. ** p < .01. *** p < .001.

evaluativism, suggesting that the denoted spillover effect is robust and generalizable across the different stages of Kuhn and Weinstock (2002) framework.

6. General discussion Fostering evaluativism has been a challenge ever since Deanna Kuhn coined the term in 1991. Against this background, the current article introduces three studies that combine the presentation of conflicting scientific evidence with writing tasks in order to positively influence the development of students’ beliefs about the nature of

5.3.3. Hypothesis 4 Our findings regarding Hypothesis 4 revealed that reading and writing about scientific controversies leads to significantly stronger intervention effects compared to reading alone – but only for absolute beliefs (topic-specific absolutism and domain-specific justification by authority). No effects were found for multiplism and evaluativism, for which two explanations can be found. First, obviously, the non-significance of effects might be due to issues of statistical power. Second – and even though we acknowledge that non-significant findings are hard

6 What speaks against this interpretation, however, is that, in our study, multiplism significantly decreased in the group that received only the reading task when conducting the analyses separately for both groups (see above).

98

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

knowledge and knowing. The intervention draws on reading about and reflecting on diverging information (i.e., scientific controversies; Kienhues et al., 2016) by using a multiple-texts approach. In contrast to ‘traditional’ interventions, it, however, uses a special form of diverging evidence since the apparent contradictions can be resolved by identifying moderating factors (in short: ‘resolvable controversies’). After reading, participants are asked to write a balanced scientific essay explicitly focusing on the conditions under which boys respectively girls are discriminated against. The present article examines the effects of writing tasks in the context of epistemic change in three separate studies: Does a combination of reading and writing about scientific controversies affect intervention efficacy? To what extent do different writing task instructions influence epistemic change? What are the effects of reading and writing on different levels of epistemic beliefs (topic-specific vs. domain-specific)? With regard to Hypothesis 1, we showed that the combination of reading and writing in the context of an epistemic belief intervention reduces absolutism and increases evaluativism – in all three studies. For multiplism, results were more inconsistent. On a theoretical level, this overall picture inclines us to believe that multiplistic beliefs might be more resistant to change in an intervention employing resolvable controversies. In fact, it is easy to process resolvable controversies in a way they are not intended at all – by simply ignoring their resolvability. In line with a general human tendency to safeguard prior beliefs from external disruptions through biased processing (Chinn & Brewer, 1993; Hatano & Inagaki, 2003), individuals holding high multiplistic beliefs might thus process the textual materials in a typically multiplistic way – by only focusing on the controversies themselves and ignoring their resolvability. Even if a reduction of multiplism worked well in individuals holding lower multiplistic beliefs (who are not so likely to solely focus on the controversies), positive and negative effects would likely cancel each other out, resulting in inconsistent findings such as ours. As for Hypothesis 2, our analyses did not yield significant results regarding the effects of different writing task instructions on epistemic change (see Section 4.3.2 for possible explanations of this unexpected result). When testing Hypothesis 4, however, we found that writing has a small to moderate incremental effect over reading – at least with regard to absolutism. Taken together, our findings regarding Hypotheses 2 and 4 therefore suggest that writing about scientific controversies indeed has beneficial effects on epistemic change, whereas it remains unclear whether one may ‘boost’ these effects through specific types of task instructions. With regard to Hypothesis 3, finally, we showed that the combination of reading and writing about scientific controversies – all with regard to one specific topic – leads to spillover effects from topic-specific epistemic beliefs to domain-specific epistemic beliefs. This effect, which was also replicated across two independent studies, supports the TIDE framework’s assumption that different levels of epistemic beliefs reciprocally influence each other (Merk et al., 2018; Muis et al., 2006). Moreover, from a practical perspective, it suggests that topic-specific interventions (e.g., short-term interventions; Kienhues et al., 2008) may have even more positive effects since the associated belief changes seem to generalize to higher-order levels of specificity. Nevertheless, our approach also has some limitations. First, we acknowledge that singling out specific aspects of our intervention that might have caused the observed changes is only possible to a limited extent. While our analyses on the writing task (Hypotheses 2 and 4) can be regarded as a step in the right direction, further research should investigate the contribution of specific text features or the adjunct questions to epistemic change. Furthermore, one may criticize our operationalization of domainspecific beliefs by means of a justification belief measure – at least in Study 1. In fact, there is still some debate on which justification dimension relates to which developmental stage. Empirically, however, our pretest data from all three studies show substantial correlations

between the FREE questionnaires (absolutism, multiplism, evaluativism) and justification beliefs (justification by authority, personal justification, justification by multiple sources) from the respective stages (see Tables 3, 7, and 8). These correlations were even higher for the domain-specific FREE-EDPSY compared to the topic-specific FREE-GST, which is not surprising since justification beliefs and the FREE-EDPSY assess epistemic beliefs at a similar level of specificity. Moreover, epistemic beliefs were measured by self-report inventories only and a few reliability indices were on the lower bound of what is generally considered acceptable. While this is a common problem in the field (DeBacker, Crowson, Beesley, Thoma, & Hestevold, 2008) that might have to do with the complex nature of the concept in question, it might also be caused by a reduced amount of variance in our rather specific sample. In fact, discipline-specific socialization processes (Trautwein & Lüdtke, 2007) make it likely that students from one discipline (in our case, psychology students) adopt a more homogeneous set of beliefs over time, thus reducing variance in the respective measures, which, in turn, entails lower reliability estimates (Thompson, 2003). What is also noteworthy is that most reliability problems occurred with the multiplism subscale of the FREE-GST. This might be due to the ‘ambivalent’ nature of the concept of multiplism: Moderate scores on a multiplism scale indicate that someone simply recognizes that there is a certain amount of tentativeness in science, whereas more extreme scores imply a devaluation of science as a whole (radical subjectivity; Hofer & Pintrich, 1997). Since these are two rather different stances, students with varying degrees of multiplism might have different response patterns with regard to multiplism items, thus reducing reliability. To address such issues, future research might strive to complement selfreport inventories by in-depth qualitative interviews, for example. As another limitation, since we did not include a delayed post-test, one may question the longevity of our findings. Speaking in favor of long-term effects, Ferguson et al. (2012) argue, referring to Vygotsky (1978), that short-term interventions might accelerate or ‘compress’ developmental processes that normally require longer time periods (see also Kerwer & Rosman, 2018a). However, it is also important to point out that our efforts aimed at developing a parsimonious method to efficiently influence epistemic beliefs in the short term. Adopting a longterm intervention concept, future research should straighten out the longevity of the effects of our approach, for example by embedding it into courses respectively curricula, or by providing refreshing sessions. Finally, we concede that our intervention approach has, up to now, only been tested in one single (and rather exotic) sample, namely psychology undergraduates. Since psychology undergraduates are used to working with diverging information (Rosman et al., 2017), they might differ from other populations regarding their approach to contradictory findings. While we, nevertheless, expect our ideas to be transferrable to all disciplines in which empirical studies are a central instrument for justifying claims (i.e., due to the theoretical foundation of our approach), investigating other populations might provide additional insights. This is especially true since the unequal gender balance in psychology students might have introduced processing bias due to the intervention’s focus on a gender-sensitive issue. In our view, two directions of bias are conceivable. First, since women are still treated unequally today (e.g., there still exists a significant gender pay gap in many countries), they might be more interested in the topic of gender discrimination, which would increase their intervention compliance and motivation (e.g., Sinatra & Mason, 2013). However, due to this increased interest, they might also have stronger and less malleable prior beliefs on the topic in question, which might make it harder for them to integrate the conflicting findings. In sum, these two processing mechanisms would have opposing effects on intervention efficacy – it remains up to future research to straighten out whether the intervention works better or worse in men compared to women. Several practical implications can be derived from our study. First and foremost, lecturers should facilitate knowledge integration of learners (e.g., Songer & Linn, 1991). This implies, for example, 99

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

presenting learning contents in a well-structured manner, discussing possible inconsistencies, or naming moderating factors. Writing tasks as well as inviting students to ‘do science’ instead of just reading about it might also be helpful (e.g., Barzilai & Ka’adan (2017); Stacey, Brownlee, Thorpe, & Reeves, 2005). Secondly, curriculum designers should place emphasis on research methods since weighing and integrating evidence often requires differentiated evaluations of study quality. Finally, our findings also have implications for the design of textbooks and course materials. Especially inconsistent knowledge claims (e.g., controversial theories) should be presented in an organized and well-structured manner, allowing students (1) to evaluate the different claims regarding their scientific quality, and (2) to integrate conflicting positions through investigating contextual factors. We concede that some of these suggestions require considerable effort, but with regard to the crucial role of epistemic beliefs in learning and instruction, we expect it to be worth it.

28(3), 879–902. https://doi.org/10.1007/s10212-012-0145-2. Bråten, I., & Strømsø, H. I. (2009). Effects of task instruction and personal epistemology on the understanding of multiple texts about climate change. Discourse Processes, 47(1), 1–31. https://doi.org/10.1080/01638530902959646. Bråten, I., & Strømsø, H. I. (2010). When law students read multiple documents about global warming: Examining the role of topic-specific beliefs about the nature of knowledge and knowing. Instructional Science, 38(6), 635–657. https://doi.org/10. 1007/s11251-008-9091-4. Bråten, I., Strømsø, H. I., & Ferguson, L. E. (2016). The role of epistemic beliefs in the comprehension of single and multiple texts. In P. Afflerbach (Ed.). Handbook of individual differences in reading: Reader, text, and context (pp. 67–79). New York: Routledge. Brownlee, J. L. (2003). Changes in primary school teachers' beliefs about knowing: A longitudinal study. Asia-Pacific Journal of Teacher Education, 31(1), 87–98. https:// doi.org/10.1080/13598660301621. Brownlee, J. L., Ferguson, L. E., & Ryan, M. (2017). Changing teachers' epistemic cognition: A new conceptual framework for epistemic reflexivity. Educational Psychologist, 29, 1–11. https://doi.org/10.1080/00461520.2017.1333430. Brownlee, J., Petriwskyj, A., Thorpe, K., Stacey, P., & Gibson, M. (2011). Changing personal epistemologies in early childhood pre-service teachers using an integrated teaching program. Higher Education Research & Development, 30(4), 477–490. https:// doi.org/10.1080/07294360.2010.518952. Brownlee, J. L., Purdie, N., & Boulton-Lewis, G. (2001). Changing epistemological beliefs in pre-service teacher education students. Teaching in Higher Education, 6(2), 247–268. Buehl, M. M., Alexander, P. A., & Murphy, K. P. (2002). Beliefs about schooled knowledge: Domain specific or domain general? Contemporary Educational Psychology, 27(3), 415–449. https://doi.org/10.1006/ceps.2001.1103. Chan, N.-M., Ho, I. T., & Ku, K. Y. L. (2011). Epistemic beliefs and critical thinking of Chinese students. Learning and Individual Differences, 21(1), 67–77. https://doi.org/ 10.1016/j.lindif.2010.11.001. Chinn, C. A., & Brewer, W. F. (1993). The role of anomalous data in knowledge acquisition: A theoretical framework and implications for science instruction. Review of Educational Research, 63(1), 1–49. https://doi.org/10.3102/00346543063001001. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, N.J.: L. Erlbaum Associates. Conley, A. M., Pintrich, P. R., Vekiri, I., & Harrison, D. (2004). Changes in epistemological beliefs in elementary science students. Contemporary Educational Psychology, 29(2), 186–204. https://doi.org/10.1016/j.cedpsych.2004.01.004. DeBacker, T. K., Crowson, H. M., Beesley, A. D., Thoma, S. J., & Hestevold, N. L. (2008). The Challenge of measuring epistemic beliefs: An analysis of three self-report instruments. Journal of Experimental Education, 76(3), 281–312. https://doi.org/10. 3200/JEXE.76.3.281-314. Elby, A., & Hammer, D. (2001). On the substance of a sophisticated epistemology. Science Education, 85(5), 554–567. https://doi.org/10.1002/sce.1023. Ferguson, L. E., Bråten, I., & Strømsø, H. I. (2012). Epistemic cognition when students read multiple documents containing conflicting scientific evidence: A think-aloud study. Learning and Instruction, 22(2), 103–120. https://doi.org/10.1016/j. learninstruc.2011.08.002. Ferguson, L. E., Bråten, I., Strømsø, H. I., & Anmarkrud, Ø. (2013). Epistemic beliefs and comprehension in the context of reading multiple documents: Examining the role of conflict. International Journal of Educational Research, 62, 100–114. https://doi.org/ 10.1016/j.ijer.2013.07.001. Greene, J. A., Azevedo, R., & Torney-Purta, J. (2008). Modeling epistemic and ontological cognition: Philosophical perspectives and methodological directions. Educational Psychologist, 43(3), 142–160. https://doi.org/10.1080/00461520802178458. Greene, J. A., Torney-Purta, J., & Azevedo, R. (2010). Empirical evidence regarding relations among a model of epistemic and ontological cognition, academic performance, and educational level. Journal of Educational Psychology, 102(1), 234–255. https://doi.org/10.1037/a0017998. Grimm, K. J., An, Y., McArdle, J. J., Zonderman, A. B., & Resnick, S. M. (2012). Recent changes leading to subsequent changes: Extensions of multivariate latent difference score models. Structural Equation Modeling: A Multidisciplinary Journal, 19(2), 268–292. https://doi.org/10.1080/10705511.2012.659627. Hagen, Å. M., Braasch, J. L. G., & Bråten, I. (2014). Relationships between spontaneous note-taking, self-reported strategies and comprehension when reading multiple texts in different task conditions. Journal of Research in Reading, 37(S1), S141–S157. https://doi.org/10.1111/j.1467-9817.2012.01536.x. Hamaker, C. (1986). The effects of adjunct questions on prose learning. Review of Educational Research, 56(2), 212–242. https://doi.org/10.3102/ 00346543056002212. Hatano, G., & Inagaki, K. (2003). When is conceptual change intended? A cognitive-sociocultural view. In G. M. Sinatra, & P. R. Pintrich (Eds.). Intentional conceptual change. L. Erlbaum. Hofer, S. I. (2015). Studying Gender Bias in Physics Grading: The role of teaching experience and country. International Journal of Science Education, 37(17), 2879–2905. https://doi.org/10.1080/09500693.2015.1114190. Hofer, B. K., & Pintrich, P. R. (1997). The development of epistemological theories: Beliefs about knowledge and knowing and their relation to learning. Review of Educational Research, 67(1), 88–140. https://doi.org/10.3102/00346543067001088. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447. Kardash, C. M., & Howell, K. L. (2000). Effects of epistemological beliefs and topic-specific beliefs on undergraduates' cognitive and strategic processing of dual-positional text. Journal of Educational Psychology, 92(3), 524–535. https://doi.org/10.1037// 0022-0663.92.3.524.

Acknowledgment We thank Gesa Freyer and Kerstin Schmitt for their assistance in text coding. Moreover, we thank Cornelia Naumann for translating the study materials and Hanna Drucks, Magdalena Hornung, and Lena Happel for proofreading the article. Disclosure statement The authors declare that they have no conflict of interest. Funding Research was funded by the German Joint Initiative for Research and Innovation with a grant acquired in the Leibniz Competition 2013 (grant number SAW-2013-ZPID-1 195), and by the German Research Foundation (DFG; grant number 392753377). The funders were not involved in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. References Abele-Brehm, A. (2017). Zur Lage der Psychologie [On the state of Psychology]. Psychologische Rundschau, 68(1), 1–19. https://doi.org/10.1026/0033-3042/ a000346. Barzilai, S., & Chinn, C. A. (2017). On the goals of epistemic education: Promoting apt epistemic performance. Journal of the Learning Sciences, 1, 1–37. https://doi.org/10. 1080/10508406.2017.1392968. Barzilai, S., & Eshet-Alkalai, Y. (2015). The role of epistemic perspectives in comprehension of multiple author viewpoints. Learning and Instruction, 36, 86–103. https:// doi.org/10.1016/j.learninstruc.2014.12.003. Barzilai, S., & Ka’adan, I. (2017). Learning to integrate divergent information sources: The interplay of epistemic cognition and epistemic metacognition. Metacognition and Learning, 12(2), 193–232. https://doi.org/10.1007/s11409-016-9165-7. Barzilai, S., & Weinstock, M. P. (2015). Measuring epistemic thinking within and across topics: A scenario-based approach. Contemporary Educational Psychology, 42, 141–158. https://doi.org/10.1016/j.cedpsych.2015.06.006. Barzilai, S., & Zohar, A. (2012). Epistemic thinking in action: Evaluating and integrating online sources. Cognition and Instruction, 30(1), 39–85. https://doi.org/10.1080/ 07370008.2011.636495. Beaman, R., Wheldall, K., & Kemp, C. (2006). Differential teacher attention to boys and girls in the classroom. Educational Review, 58(3), 339–366. https://doi.org/10.1080/ 00131910600748406. Bendixen, L. D. (2002). A process model of epistemic belief change. In B. K. Hofer, & P. R. Pintrich (Eds.). Personal epistemology: The psychology of beliefs about knowledge and knowing (pp. 191–208). Mahwah, NJ: Lawrence Erlbaum Associates. Bråten, I., Britt, M. A., Strømsø, H. I., & Rouet, J.-F. (2011). The role of epistemic beliefs in the comprehension of multiple expository texts: Toward an integrated model. Educational Psychologist, 46(1), 48–70. https://doi.org/10.1080/00461520.2011. 538647. Bråten, I., & Ferguson, L. E. (2014). Investigating cognitive capacity, personality, and epistemic beliefs in relation to science achievement. Learning and Individual Differences, 36, 124–130. https://doi.org/10.1016/j.lindif.2014.10.003. Bråten, I., Ferguson, L. E., Strømsø, H. I., & Anmarkrud, Ø. (2013). Justification beliefs and multiple-documents comprehension. European Journal of Psychology of Education,

100

Contemporary Educational Psychology 58 (2019) 85–101

T. Rosman, et al.

epistemic sophistication by considering domain-specific absolute and multiplicistic beliefs separately. British Journal of Educational Psychology, 86(2), 204–221. https:// doi.org/10.1111/bjep.12098. Porsch, T., & Bromme, R. (2011). Effects of epistemological sensitization on source choices. Instructional Science, 39(6), 805–819. https://doi.org/10.1007/s11251-0109155-0. Revelle, W., & Zinbarg, R. E. (2009). Coefficients Alpha, Beta, Omega, and the glb: Comments on Sijtsma. Psychometrika, 74(1), 145–154. https://doi.org/10.1007/ s11336-008-9102-z. Rieske, V. (2011). Bildung von Geschlecht: Zur Diskussion um Jungenbenachteiligung und Feminisierung in deutschen Bildungsinstitutionen. Eine Studie im Auftrag der Max-TraegerStiftung. Frankfurt: Gewerkschaft Erziehung und Wissenschaft. Rosman, T., & Mayer, A.-K. (2018). Epistemic beliefs as predictors of epistemic emotions: Extending a theoretical model. British Journal of Educational Psychology, 88(3), 410–427. https://doi.org/10.1111/bjep.12191. Rosman, T., Mayer, A.-K., Kerwer, M., & Krampen, G. (2017). The differential development of epistemic beliefs in psychology and computer science students: A four-wave longitudinal study. Learning and Instruction, 49, 166–177. https://doi.org/10.1016/j. learninstruc.2017.01.006. Rosman, T., Mayer, A.-K., Peter, J., & Krampen, G. (2016). Need for cognitive closure may impede the effectiveness of epistemic belief instruction. Learning and Individual Differences, 49, 406–413. https://doi.org/10.1016/j.lindif.2016.05.017. Rosman, T., Peter, J., Mayer, A.-K., & Krampen, G. (2016). Conceptions of scientific knowledge influence learning of academic skills: Epistemic beliefs and the efficacy of information literacy instruction. Studies in Higher Education, 43(1), 96–113. https:// doi.org/10.1080/03075079.2016.1156666. Rule, D. C., & Bendixen, L. D. (2010). The integrative model of personal epistemology development: Theoretical underpinnings and implications for education. In L. D. Bendixen, & F. C. Feucht (Eds.). Personal epistemology in the classroom: Theory, research, and implications for practice (pp. 94–123). Cambridge, UK: Cambridge University Press. https://doi.org/10.1017/cbo9780511691904.004. Schommer, M. (1993). Epistemological development and academic performance among secondary students. Journal of Educational Psychology, 85(3), 406–411. https://doi. org/10.1037/0022-0663.85.3.406. Schommer, M., & Walker, K. (1995). Are epistemological beliefs similar across domains? Journal of Educational Psychology, 87(3), 424–432. https://doi.org/10.1037/00220663.87.3.424. Sinatra, G. M., & Mason, L. (2013). Beyond knowledge: Learner characteristics influencing conceptual change. In S. Vosniadou (Ed.). Educational psychology handbook. International handbook of research on conceptual change (pp. 377–394). (2nd ed.,). Hoboken, NY: Taylor and Francis. Songer, N. B., & Linn, M. C. (1991). How do students' views of science influence knowledge integration? Journal of Research in Science Teaching, 28(9), 761–784. https://doi.org/10.1002/tea.3660280905. Stacey, P., Brownlee, J., Thorpe, K., & Reeves, D. (2005). Measuring and manipulating epistemological beliefs in early childhood education students. International Journal of Pedagogies and Learning, 1(1), 6–17. https://doi.org/10.5172/ijpl.1.1.6. Tabachnick, B. G., & Fidell, L. S. (2000). Using multivariate statistics (4th ed.). Boston, MA: Allyn and Bacon. Thomm, E., Barzilai, S., & Bromme, R. (2017). Why do experts disagree? The role of conflict topics and epistemic perspectives in conflict explanations. Learning and Instruction, 52, 15–26. https://doi.org/10.1016/j.learninstruc.2017.03.008. Thompson, B. (2003). Guidelines for authors reporting score reliability estimates. In B. Thompson (Ed.). Score reliability: Contemporary thinking on reliability issues (pp. 91– 102). Thousand Oaks, CA: SAGE Publications. Trautwein, U., & Lüdtke, O. (2007). Epistemological beliefs, school achievement, and college major: A large-scale longitudinal study on the impact of certainty beliefs. Contemporary Educational Psychology, 32(3), 348–366. https://doi.org/10.1016/j. cedpsych.2005.11.003. Trautwein, U., & Lüdtke, O. (2008). Die Erfassung wissenschaftsbezogener Überzeugungen in der gymnasialen Oberstufe und im Studium. Zeitschrift Für Pädagogische Psychologie, 22(34), 277–291. https://doi.org/10.1024/1010-0652.22. 34.277. Trautwein, U., Lüdtke, O., & Beyer, B. (2004). Rauchen ist tödlich, Computerspiele machen aggressiv? Zeitschrift Für Pädagogische Psychologie, 18(3/4), 187–199. https:// doi.org/10.1024/1010-0652.18.34.187. Trevors, G., Feyzi-Behnagh, R., Azevedo, R., & Bouchet, F. (2016). Self-regulated learning processes vary as a function of epistemic beliefs and contexts: Mixed method evidence from eye tracking and concurrent and retrospective reports. Learning and Instruction, 42, 31–46. https://doi.org/10.1016/j.learninstruc.2015.11.003. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press.

Kerwer, M., & Rosman, T. (2018b). Mechanisms of epistemic change: Under which circumstances does diverging information support epistemic development? [Dataset]. Retrieved from https://www.psycharchives.org/handle/20.500.12034/738. Kerwer, M., & Rosman, T. (2018a). Mechanisms of epistemic change: Under which circumstances does diverging information support epistemic development? Frontiers in Psychology, 9, 1060. https://doi.org/10.3389/fpsyg.2018.02278. Kienhues, D., Bromme, R., & Stahl, E. (2008). Changing epistemological beliefs: The unexpected impact of a short-term intervention. British Journal of Educational Psychology, 78(4), 545–565. https://doi.org/10.1348/000709907X268589. Kienhues, D., Ferguson, L. E., & Stahl, E. (2016). Diverging information and epistemic change. In J. A. Greene, W. A. Sandoval, & I. Bråten (Eds.). Handbook of epistemic cognition (pp. 318–330). London: Routledge. Kline, P. (1999). Handbook of psychological testing (2nd ed.). London: Routledge. Klopp, E., & Stark, R. (2016). Entwicklung eines Fragebogens zur Erfassung domänenübergreifender epistemologischer Überzeugungen [Development of a domain-general epistemological beliefs questionnaire], Unpublished manuscript, Department of Educational Science, Saarland University, Saarbrücken, Germany. Krettenauer, T. (2005). Die Erfassung des Entwicklungsniveaus epistemologischer Überzeugungen und das Problem der Übertragbarkeit von Interviewverfahren in standardisierte Fragebogenmethoden [Measuring the developmental level of epistemological beliefs and the problem of transfering interview procedures to standardized questionnaire methods]. Zeitschrift Für Entwicklungspsychologie Und Pädagogische Psychologie, 37(2), 69–79. https://doi.org/10.1026/0049-8637.37.2.69. Kuhn, D., & Weinstock, M. (2002). What is epistemological thinking and why does it matter? In B. K. Hofer, & P. R. Pintrich (Eds.). Personal epistemology: The psychology of beliefs about knowledge and knowing (pp. 121–144). Mahwah, NJ: Lawrence Erlbaum Associates. Maaz, K., Baeriswyl, F., & Trautwein, U. (2011). Herkunft zensiert? Leistungsdiagnostik und soziale Ungleichheiten in der Schule. Eine Studie im Auftrag der Vodafone Stiftung Deutschland: Vodafone Stiftung Deutschland. Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43(6), 304–316. https://doi.org/ 10.3102/0013189X14545513. Mayer, A.-K., & Rosman, T. (2016). Epistemologische Überzeugungen und Wissenserwerb in akademischen Kontexten. In A.-. K. Mayer, & T. Rosman (Eds.). Denken über Wissen und Wissenschaft - Epistemologische Überzeugungen (pp. 7–23). Lengerich, Germany: Pabst Science Publishers. McArdle, J. J. (2009). Latent variable modeling of differences and changes with longitudinal data. Annual Review of Psychology, 60, 577–605. https://doi.org/10.1146/ annurev.psych.60.110707.163612. Merk, S., Rosman, T., Muis, K. R., Kelava, A., & Bohl, T. (2018). Topic specific epistemic beliefs: Extending the theory of integrated domains in personal epistemology. Learning and Instruction, 56, 84–97. https://doi.org/10.1016/j.learninstruc.2018.04. 008. Merk, S., Rosman, T., Rueß, J., Syring, M., & Schneider, J. (2017). Pre-service teachers' perceived value of general pedagogical knowledge for practice: Relations with epistemic beliefs and source beliefs. PloS ONE, 12(9), e0184971. https://doi.org/10. 1371/journal.pone.0184971. Merk, S., Schneider, J., Syring, M., & Bohl, T. (2016). Pädagogisches Kaffeekränzchen oder harte empirische Fakten? Domänen- und theorienspezifische epistemologische Überzeugungen Lehramtsstudierender bezüglich allgemeinen pädagogischen Wissens. In A.-. K. Mayer, & T. Rosman (Eds.). Denken über Wissen und Wissenschaft Epistemologische Überzeugungen (pp. 71–100). Lengerich, Germany: Pabst Science Publishers. Muis, K. R., Bendixen, L. D., & Haerle, F. C. (2006). Domain-generality and domainspecificity in personal epistemology research: Philosophical and empirical reflections in the development of a theoretical framework. Educational Psychology Review, 18(1), 3–54. https://doi.org/10.1007/s10648-006-9003-6. Muis, K. R., & Duffy, M. C. (2013). Epistemic climate and epistemic change: Instruction designed to change students' beliefs and learning strategies and improve achievement. Journal of Educational Psychology, 105(1), 213–225. https://doi.org/10.1037/ a0029690. Muis, K. R., Pekrun, R., Sinatra, G. M., Azevedo, R., Trevors, G. J., Meier, E., & Heddy, B. C. (2015). The curious case of climate change: Testing a theoretical model of epistemic beliefs, epistemic emotions, and complex learning. Learning and Instruction, 39, 168–183. https://doi.org/10.1016/j.learninstruc.2015.06.003. Muis, K. R., Trevors, G. J., Duffy, M., Ranellucci, J., & Foy, M. J. (2016). Testing the TIDE: Examining the nature of students’ epistemic beliefs using a multiple methods approach. Journal of Experimental Education, 84(2), 264–288. https://doi.org/10.1080/ 00220973.2015.1048843. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716. Peter, J., Rosman, T., Mayer, A.-K., Leichner, N., & Krampen, G. (2016). Assessing

101