Enhancing evidence-based argumentation in a Mainland China middle school

Enhancing evidence-based argumentation in a Mainland China middle school

Contemporary Educational Psychology 59 (2019) 101809 Contents lists available at ScienceDirect Contemporary Educational Psychology journal homepage:...

2MB Sizes 0 Downloads 19 Views

Contemporary Educational Psychology 59 (2019) 101809

Contents lists available at ScienceDirect

Contemporary Educational Psychology journal homepage: www.elsevier.com/locate/cedpsych

Enhancing evidence-based argumentation in a Mainland China middle school☆

T

Yuchen Shi Institute of Curriculum and Instruction, Faculty of Education, East China Normal University, Shanghai, China

ARTICLE INFO

ABSTRACT

Keywords: Argumentation Dialog Discourse Evidence Writing China

While discourse-based educational approaches have become an object of increasing interest in Western countries, they are largely unknown in countries such as China that are characterized by a strong centralized government with limited encouragement of dissent. In the present study, 54 11–12 year-old Chinese students participated in an extended discourse-based curriculum that has been found successful in Western countries in developing skills of both dialogic and individual written argument. Although the curriculum involves activities unfamiliar to Chinese students, they easily became engaged and showed significant gains in post-intervention essays on both the curriculum topics and new ones. An additional component newly added to the curriculum involved explicit reflection on the relations between a claim and evidence and proved effective in enhancing gains relative to a comparison group not experiencing this addition. Underlying mechanisms in the transfer from dialogic to individual skill are considered, and issues with respect to culture and education are addressed.

1. Introduction Argumentation has come to be recognized as a key component of science instruction (Driver, Newton, & Osborne, 2000; Duschl & Osborne, 2002; Ford, 2012; Jiménez-Aleixandre, Rodríguez, & Duschl, 2000; Sandoval, 2014), as well as in education in other disciplinary fields (Felton, Crowell, & Liu, 2015; Nussbaum, 2008; Wiley & Voss, 1999; Wissinger & Paz, 2016; Wolfe, 2011). The increasing focus on argumentation as a core higher-order cognitive skill across the curriculum has led to investigations of how students understand and use evidence as a crucial component of argumentation (Duncan, Chinn, & Barzilai, 2018; Iordanou & Constantinou, 2015; Kelly & Takao, 2002; Iordanou, Kuhn, Matos, Shi, & Hemberger, 2019; Kuhn & Moore, 2015; Macagno, 2016; Manz & Renga, 2017; McNeill & Berland, 2016; Rinehart, Duncan, & Chinn, 2014; Sandoval & Millwood, 2005; Villarroel, Felton, & Garcia-Mila, 2016). Central to argumentation is the construction of evidence-based claims, requiring the coordination of a claim with evidence bearing on it. Yet weaknesses in students’ ability to coordinate claim with evidence have been consistently documented (Berland & Reiser, 2011), including failure to differentiate empirical evidence from theoretical accounts of mechanism (Brem & Rips, 2000; Kuhn, 1991), failure to justify the relation between evidence and claim (Clark & Sampson, 2007; Erduran,

Simon, & Osborne, 2004; Jiménez-Aleixandre et al., 2000; Kelly, Druker, & Chen, 1998; Sandoval & Millwood, 2005), inability to use evidence to weaken a claim (Kuhn & Moore, 2015; Mayweg-Paus & Macagno, 2016), and failure to address evidence inconsistent with one’s own position (Hemberger, Kuhn, Matos, & Shi, 2017; Shi, Matos, & Kuhn, 2019; Iordanou and Constantinou, 2015). These weaknesses are all extremely serious ones that undermine a student’s ability to construct a comprehensive evidence-based argument. Although extended engagement in the discourse-based curriculum we use here has been shown to produce gains in these respects (Hemberger et al., 2017; Shi, Matos, & Kuhn, 2019; Kuhn, Hemberger, & Khait, 2016, a; Kuhn & Moore, 2015), the use of evidence to weaken claims and attention to evidence incongruent with one’s own position remain significant weaknesses in students’ argumentation skills. Hence, for the present study a new component was developed and added to the curriculum, focused specifically on identifying and heightening reflective awareness of relations between claims and evidence. We assess whether this additional component enhances gains relative to a comparison group not experiencing this addition. A further novel feature of the present study is that we implemented this discourse-based argument curriculum in a middle school in a large city in Mainland China. We purposefully chose to carry out the study in a country traditionally limited in valuing and encouraging dissent in

☆ The work reported here is based on a dissertation by the author submitted in partial fulfillment of the requirements for the Ph.D. degree at Teachers College, Columbia University. E-mail address: [email protected].

https://doi.org/10.1016/j.cedpsych.2019.101809

Available online 09 October 2019 0361-476X/ © 2019 Elsevier Inc. All rights reserved.

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

public discourse. Political aspect aside, Chinese educational philosophy is built on a value system founded on Confucianism (Dong, Anderson, Kim, & Li, 2008; Tu, 1985). These values include strict discipline (Miller, Wiley, Fung, & Liang, 1997), hierarchical relations between teacher and student (Hofstede, 1986), discouragement of direct questioning and confrontation (Kim-Goh, 1995), and unproblematic accumulation of knowledge transmitted by teachers to students (Tweed & Lehman, 2002). Would Chinese students, with most of their school experience consisting of teacher-controlled instruction (Dong et al., 2008) and rote memorization (Aoki, 2008), respond to this studentcentered discourse-based curriculum? A positive outcome in this challenging context, we believed, would further support the claim that dialog serves as an effective bridge to individual argumentive thinking, writing and speaking (Hemberger et al., 2017; Kuhn & Crowell, 2011; Kuhn, Hemberger, & Khait, 2016, b, 2016, a).

question, or refute a claim is believed to reflect an individual’s epistemological understanding (Kuhn & Park, 2005; Sandoval & Millwood, 2005). If theories merely constitute readily verifiable facts, as epistemological absolutists believe, or if theories are simply personal opinions, as multiplists believe, there is little point in examining claims in a framework of alternative, evidence, and argument (Hofer & Pintrich, 1997; Kuhn, Cheney, & Weinstock, 2000). Only at the most advanced, evaluativist level, do individuals recognize that the construction of knowledge requires coordination of claim and evidence (both consistent and inconsistent with the claim), in a consciously controlled manner, and are inclined to expend the mental effort required (Greene, Sandoval, & Bråten, 2016; Kuhn, in press). Achieving skill in claim-evidence coordination requires support with respect to both its strategic and metastrategic components. At the strategic level, students need extended opportunities to practice coordinating claims with evidence both consistent and inconsistent with a claim. At the metastrategic level, students need to develop meta-level understanding of the purpose and goals of evidence use in argument and develop “proactive executive control strategies” (Nussbaum & Asterhan, 2016) to plan and monitor such use. The combination of practice and meta-level reflection has yielded favorable results in various educational domains (Felton, 2004; King, 1991, 1992; Kramarski & Mevarech, 2003; Nussbaum & Edwards, 2011; Pressley & Ghatala, 1990; Song & Ferretti, 2013; Zohar & David, 2008). Here we examine these combined effects in developing students’ skilled use of evidence in argument.

1.1. The role of evidence in argument Numerous theorists point to evidence-based claims as a core component of argument. Baker (2003, 2009) defined argumentation as an activity that involves establishing specific types of relations between the claims being discussed and other sources of knowledge, including evidence. Walton and Zhang (2013) pointed out that knowledge is evidence-based argumentation and the propositions supported by evidence are accepted as true, rather than being believed to be true. According to Toulmin (1958) argumentation model the legitimacy of any particular claim made about data depends on the warrants offered to justify the relationship between data and claim. Similarly, the Next Generation Science Standards specify that “Data aren’t evidence until used in the process of supporting a claim” (NGSS Appendix F, p. 7). However, students are reported to fail to cite sufficient data to support claims (Sandoval & Millwood, 2005). Even if they do recognize the need to refer to data, they frequently do not include adequate warrants or backings to specify how specific data related to particular claims (Bell & Linn, 2000; Jiménez-Aleixandre et al., 2000; Ryu & Sandoval, 2012; Sandoval & Millwood, 2005). This frequently observed difficulty in coordinating claim with evidence is connected to a long-standing line of research reporting on individuals’ preference for explanation (of mechanism) over evidence in justifying claims (Kuhn, 1991). In the present study, we draw a distinction between information and evidence, with factual information that has been accepted as correct becoming evidence only when it is employed in relation to a claim. Making this distinction, we believe, is useful in instructional settings with beginning students. Students’ failure to recognize evidence as distinct from claim and bearing on it precludes adhering to a theory-evidence coordination model in which multiple alternatives are considered, and a claim that has the most consistent and least inconsistent evidence associated with it is the alternative that is chosen. A confirmation bias in individuals’ evaluation of evidence both compatible and incompatible with their prior beliefs has been widely documented (Chinn & Brewer, 1993; Edwards & Smith, 1996; Kahneman, 2011; Klaczynski & Gordon, 1996; Lord, Ross, & Lepper, 1979). Such confirmation bias is manifested in individuals applying differential standards in judging belief-consistent and belief-inconsistent information; with belief-consistent information readily accepted at face value while belief-inconsistent subjected to longer and hypercritical scrutiny. As a consequence, individuals demonstrate “my-side bias” (Stanovich & West, 1997; Wolfe & Britt, 2008) by focusing attention on presenting evidence consistent with their belief and neglecting to address evidence inconsistent with their belief. In fact, to overcome this bias and to be able to decouple prior belief from the evaluation and employment of evidence of mixed types, as we seek to do in the present study, is widely regarded as a core cognitive achievement (Stanovich, West, & Toplak, 2013; Baron, 2008). The ability and disposition to use evidence to support, assess,

1.2. Dialog as a bridge to argumentive writing Discourse-based approaches have increasingly become favorably regarded as curricular methods both in academic circles and in classroom practice, with a number of recent reviews tracing this development (Clarke, Resnick, & Rose, 2015; Resnick, Asterhan, & Clarke, 2015; Howe & Abedin, 2013; Mehan & Cazden, 2015; Murphy et al., 2018; O'Connor & Snow, 2018). Compared to teacher-centered discourse, student-centered discourse emphasizes open participation, accountability, reasoning and reflective thinking on students’ part (Applebee, 1996; Clarke et al., 2015; Mercer & Littleton, 2007; Wegerif, Mercer, & Dawes, 1999). Most discourse-based student-centered approaches are implemented at the whole-class level (Michaels, O’Connor, & Resnick, 2008) or small-group level (Mercer & Wegerif, 1999; Nussbaum & Edwards, 2011; Reznitskaya et al., 2001). However, challenges have been commonly reported with respect to teachers’ successful orchestration and management of whole-class or small-group discussion (Mehan & Cazden, 2015; Resnick, Asterhan, Clarke, & Schantz, 2018), as well as difficulty in including all students as participants (O'Connor & Snow, 2018). The intervention employed here is based on the dialog-based argument curriculum developed by Kuhn and colleagues (Kuhn & Crowell, 2011; Kuhn et al., 2016a; Kuhn, 2019) that reduces these challenges by means of its emphasis on student-to-student dialog, i.e., between two parties who speak or write directly to one another and are directly responsible for maintaining the exchange. The majority of its activities take place in direct student-to-student discourse, thereby increasing density of students’ discourse experience beyond what would typically occur in a whole-class context. In so doing it affords the teacher a less central role – a core respect in which it departs from Chinese classroom norms that mostly emphasize one-sided transmission of knowledge from teacher to students (Miller et al., 1997; Tweed & Lehman, 2002). The dialog-based argument approach is consistent with the sociocultural concept of the interiorization of action from the external social to the internal individual plane (Vygotsky, 1978). Individual cognition is shaped through social interaction and discourse with another serves as a bridge to individual written argument, providing it an audience and purpose (Graff, 2003). While dialogs can take many forms, each 2

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

characterized by different goals and different kinds of procedural rules to facilitate attaining these goals (Walton, 1989), the type of dialog we focus on in the present study is persuasive dialog and the goal of each participant is to persuade other participant of the acceptability of a specific proposition, based on premises that the other participant either already accepted or is prepared to accept (Walton, 1989). In the present implementation of the approach, students engage deeply with a series of controversial and challenging topics. They engage in two forms of peer discourse, verbally with a same-side partner and electronically with an opposing-side pair. Previous studies have documented the success of this curriculum in producing significant gains in both verbal and electronic argumentation (Crowell & Kuhn, 2014; Iordanou, 2013) and extending to individual written argument (Kuhn & Crowell, 2011; Kuhn et al., 2016a), knowledge acquisition (Iordanou et al., 2019), epistemological understanding (Iordanou, 2016), and meta-level understanding of norms of argumentation (Kuhn, Zillmer, Crowell, & Zavala, 2013). Topics selected for students to engage in are often ones entailing dense topic-related knowledge and ones students have limited prior knowledge of (Iordanou et al., 2019). Since students’ dialogic argumentation cannot take place in a vacuum of knowledge, in previous studies (Hemberger et al., 2017; Shi, Matos, & Kuhn, 2019; Iordanou et al., 2019), we have adopted the method of providing students with carefully selected information during the course of dialog sessions, coupled with the method of inviting students to generate questions of their own, “the answers to which might help you in your arguments.” The research team provides brief (1–2-sentence) factual answers to these questions, and the resulting question-and answer (Q & A) set then becomes available for use by the entire class throughout their work on the topic. To support the claim that discourse serves as a bridge to argumentive writing, Hemberger et al. (2017) traced students’ use of evidence in argumentive dialogs. We found that the evidence type most likely to be reused after first use was the weaken-other (O−) type, more so than the less challenging support-own (M+) type (see Table 6 for examples), suggesting that dialogs help students overcome the tendency to use evidence exclusively to support their own side. Similarly, Iordanou and Constantinou (2015) found that high-school students who engaged in a similar argument intervention significantly increased use of evidence to weaken opponents’ claims in dialogs, compared to students who did not engage in argumentation. In addition, several studies have compared students’ use of evidence in dialogs and in their subsequent argumentive writing. Students draw from a broader range of prior knowledge in dialogs than in essays, and they tend to use evidence to weaken an opposing claim more often in dialogs than in essays (Kuhn & Moore, 2015; Mayweg-Paus & Macagno, 2016). Moreover, the types of evidence used in dialogs tend to precede their appearance in essays (Hemberger et al., 2017). Therefore, in the present study we hypothesize that encouraging students to use evidence during argumentive dialogs will serve as a fruitful path to developing their use of evidence in writing.

incongruent with students’ own position in particular, should not be attributed simply to students’ lack of access to content knowledge, since a list of topic-related evidence generated during dialogue sessions is available to students during writing of the final essay. While this research indicates that progress is attainable, the fact that such progress is slow, following one-year (Hemberger et al., 2017) and sometimes two-year intervention (Kuhn & Moore, 2015), and that the gains only partially transferred to a non-intervention topic (Hemberger et al., 2017), led to the present proposal that dense engagement in the practice of using evidence may by itself not be sufficient. We propose further that underlying these struggles is insufficient development at both reflective, meta-strategic and strategic levels. The present study thus seeks to capitalize on and extend the reflective component of the argument curriculum. The curriculum encourages reflection by making available a transcript that serves as an object of reflection and is subsequently used in additional reflective activities. Previous studies have not, however, engaged students in reflective exercises specific to the relations between a claim and evidence, the focus of the present study. Our most recent study in fact pointed to the potential efficacy of emphasizing the meta-level aspect of evidence use. In Iordanou et al. (2019), we manipulated the prompt we gave to students to encourage them to address position-incongruent evidence. A seemingly slight change in the wording from “Try to use this information in your arguments” to “Not all of the evidence is going to support your side; if it doesn’t, see if you can deal with it,” led to an increase in students’ inclusion of evidence to both weaken and support the opposing side. We believed this was the case because the modified prompt signaled to student the potential conflict between evidence and claim. Despite this encouraging finding, the number of each type of evidence-based claim remained low, with less than 2.5 units for weaken-other evidence and less than one unit for support-other evidence. However, if a seemingly minor change in instructional framing can make a significant difference in how students construe the task, we wondered the extent to which engaging students in more extended and repeated reflective exercises might enable them to address evidence both consistent and inconsistent with their position. In another recent study (Shi et al., 2019), we found that students who engaged in the dialog-based argument curriculum for an entire school year showed enhanced meta-level understanding of the goal of evidence use in argumentation, specifically in their enhanced recognition of the weakening function of evidence, compared to students in a non-participating control group who only recognized the supporting function of evidence. This finding suggests that development on the strategic and meta-strategic planes may take place simultaneously, with one supporting the other in a mutually reinforcing loop. More broadly, studies involving explicit metacognitive training during learning activities have been reported in various learning situations. These studies address different learning goals, ranging from enhancement in argumentive discourse skills (Felton, 2004), argumentive writing (Nussbaum & Edwards, 2011; Nussbaum et al., 2018; Song & Ferretti, 2013), mathematical reasoning (Kramarski & Mevarech, 2003), scientific thinking (Zohar & David, 2008), to improvement in problem-solving skills (King, 1991, 1992) and reading comprehension (Pressley & Ghatala, 1990). All these studies point to the efficacy of intervention that combines practice with metacognitive reflection, over and above practice alone. The advantage of metacognitive training becomes particularly evident during transfer tasks, where gains in strategy performance through practice alone are often lost once the initial instructional context is withdrawn. The present intervention, involving peers’ joint metacognitive reflection, further draws on the sociocultural framework (Rogoff, 1990; Vygotsky, 1978), with the rationale that repeated metacognitive exercise prompted at the external, social level will be interiorized and become operative as well at the individual level (Brown, 1997; Kuhn, 2000). It is anticipated that when students individually write an argumentive essay at the end of each topic cycle, they will engage in self-

1.3. The present study In our previous intervention studies (Hemberger et al., 2017; Shi, Matos, & Kuhn, 2019; Iordanou et al., 2019), following dense and extended engagement in the curriculum, students made progress in employing evidence in argumentive writing; yet their final performance was still less than could be hoped for. In the study by Hemberger et al. (2017), among essays on the final intervention topic, intervention students included on average only 3 idea units that successfully connected claim with evidence; typically one or two served to support own position (M+) and one weakened the opposing position (O−). Inclusion of evidence-based statements that supported opposing position (O+) or weakened own position (M−) were scarce. The low frequency of claims that drew on evidence in general, and of claims that addressed evidence 3

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

reflective and self-regulatory processes, mirroring the reflective and regulatory activities carried out with a partner during their joint activities. Moreover, when the social support is removed, as is the case for a transfer task in which students are asked to write individually on a non-intervention topic, reflective exercises conducted with earlier topics may support students’ maintaining meta-level management of their individual performance, resulting in superior strategic performance in a transfer task. The design of the study thus allows us to investigate three research questions. First, for each of the three intervention topics, do students in an Evidence Reflection and Argument Practice (ER + AP) condition outperform those in Argument Practice (AP) condition in addressing both claim-congruent and claim-incongruent evidence in final individual essays on the topic? Second, in transfer essays on a non-intervention topic, are any such differences maintained between these two intervention groups and in comparison to a non-intervention control group? Third, what does analysis of students’ responses to the evidence reflection prompts reveal regarding the underlying mechanism of change?

Table 1 Means (and SDs) on school diagnostic test for two intervention and control conditions. Condition

N

Chinese Language Arts

Chinese Language Arts Essay Component

Mathematics

ER + AP AP Control

27 27 50

110.30 (7.55) 110.48 (8.40) 109.96 (6.13)

47.85 (3.92) 46.81 (3.39) 46.84 (3.08)

117.26 (17.05) 117.44 (21.01) 114.60 (12.40)

Note. Maximum score was 150 for Chinese Language Arts, 150 for Mathematics, and 60 for Chinese Language Arts essay component. AP = Argument Practice condition; ER + AP = Evidence Reflection plus Argument Practice condition.

The intervention constituted a stand-alone course that met separately for three 40-min class periods per week, for a total of 30 class periods. When students from the ER + AP condition met for the intervention, students from the AP condition were taught for a different school subject in a separate classroom, and vice versa. The author served as the lead teacher of both intervention conditions to ensure equivalence in instruction except for the added component for the ER + AP condition: distributing and collecting Evidence Reflection Sheets for dyads to complete on their own. Two teachers from the school’s staff served as assistants to the lead teacher but did not deliver any instruction. To ensure fidelity of treatment, all interventions classes were videotaped and a colleague not involved in the present work observed the videotaped classes. The author and colleague coded for the presence or absence of each of the components described in the lesson plans, with perfect inter-rater reliability. The categories that were coded to determine the degree of implementation fidelity were the following: discussing in small-group discussion, conducting electronic dialogs, receiving evidence, completing reflection sheets, engaging in whole-class showdown, analyzing showdown maps. In total, for ER + AP and AP conditions respectively, twelve videos were coded (four for each topic), to confirm that all the components in the lesson plans were adhered to.

2. Method 2.1. Participants Participants were 104 students (65 girls and 39 boys) in a school in a large city in the Western part of Mainland China. Their ages ranged from 11 years 10 months to 12 years 10 months. They were entering the 7th grade (the beginning of middle school in China) at the start of the intervention. The school was a selective middle school, admitting roughly 30% of the students who applied. Participants all came from middle to upper-middle class Chinese families. Participants were all native-born Chinese, with the Chinese language their first language. The curriculum and instruction at the school closely follow the standards stipulated by the Ministry of Education of People’s Republic of China. The emphasis of instruction in most school disciplines is on students’ mastery of subject content knowledge and repeated exercises in test questions to ensure high scores in standardized testing. Classroom discussion, debate, and peer collaboration emphasized in the present intervention are mostly missing in classroom activities at the school.

2.3. Procedure 2.3.1. Pre- and post-assessments One week before the start of the intervention, an individual written pre-assessment essay was written by students in all conditions. Students were given the entire class period (40 min) to complete it in a wholeclass setting. The same assessment was re-administered during the week following the intervention. The essay topic was as follows: Should juveniles who have committed serious crimes tried in an adult court or a juvenile court? There was no minimum or maximum word limit. School staff confirmed that this topic had never been addressed in the school curriculum. A list of 11 short pieces of evidence (Appendix A) in question-andanswer format was distributed to students before they started writing the essay. (Materials in this and subsequent appendices are all English translations of materials presented to students in Chinese.) Of the 11 pieces, some could more readily be used to support the adult court position and some the juvenile court position. They appeared in a mixed order. All students received the same verbal prompt: “Here is some information relevant to the topic. Not all information is going to support the side you favor. If it doesn’t, see if you can deal with it. Feel free to use any information you would like to in your essay, but you are not required to do so.”

2.2. Design In a quasi-experimental design, one of the school’s multiple classes at this grade level was randomly chosen to serve in the Intervention condition (n = 54) and another to serve in the Control condition (n = 50). Students in the Control condition participated only in the preand post-assessments and otherwise received their regular instruction. Students in the Intervention condition participated in the pre- and postassessments, and the four-month thrice-weekly argument curriculum to be described. The Intervention group was further randomly assigned to one of two forms of the Intervention condition – an Argument Practice (AP) condition (n = 27) and an Evidence Reflection and Argument Practice (ER + AP) condition (n = 27). To ensure initial equivalence across conditions, results of diagnostic tests administered at the beginning of the school year, which assessed skills in Chinese Language Arts and Mathematics, were analyzed. Because the major outcome measure in the present study is students’ essays, scores for the essay component of the Chinese Language Arts test were analyzed separately. This test asked students to write a 600-word essay about a personal experience. Table 1 presents means and standard deviations for each test by condition. A one-way analysis of variance (ANOVA) showed no significant difference in essay score across the three conditions, F (2, 101) = 0.91, p = .408. Nor was there a significant difference in Chinese Language Arts score, F (2, 101) = 0.05, p = .950, or Mathematics score, F (2, 101) = 0.38, p = .569.

2.3.2. Intervention One week before the start of the intervention, a list of 10 topics used in previous interventions were presented to solicit students’ positions on these issues. Three topics were chosen on which students were approximately evenly divided in favoring one or the other of two opposing positions. The topics were engaged in the order they appear in Table 2, 4

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

Table 2 Intervention topics. Topic 1 Topic 2 Topic 3

Should teenagers over 16 focus only on their schoolwork or should they take on a part-time job? In order to better treat human illnesses, should animals be used to test new medical products and procedures? Should the sale of kidneys be legalized in China?

with an equal number of sessions devoted to each. The activity sequence described below was followed for each intervention topic. The intervention procedure followed that implemented in previous work (Hemberger et al., 2017), with the only added component that of the evidence reflection activity in the ER + AP condition.

colleague not involved in the present study examined the evidence presented to students for each topic and ensured that they were balanced in favoring the two opposing positions. 2.3.3. Session 7 Students assembled in their same-side teams to participate in a whole-class verbal debate, in which individual students from each side verbally debated a student from the opposing side, pausing for exchange to seek help from their teammates as needed.

2.3.2.1. Session 1. Following a brief introduction to the topic, students assembled into same-side small groups of 4–5 based on their preferred side. Each small group worked on generating reasons to support their side, recording them on index cards and sharing them with the rest of the group. The group then reflected on their reasons, consolidated and eliminated duplicates, and collaborated in ranking the reasons with regard to their strength.

2.3.4. Session 8 The whole-class verbal debate session was video recorded and subsequently transcribed by the lead teacher who produced an argument map, which facilitated review in the whole-class debrief during session 8. Students were guided through the argument map, with points awarded for effective argumentive moves, such as counterarguments, rebuttals, and evidence use, and points subtracted for ineffective moves, such as unwarranted assumptions, unsupported claims, and misused evidence. At the end of the debrief session, positive and negative points were summed for each side and a winning side was declared.

2.3.2.2. Sessions 2–6. Same-side, same-gender pairs were formed who remained together throughout these five sessions. At each session, the pair engaged in an electronic dialog via instant-messaging software with a succession of opposing-side pairs, a different opposing-side pair each session. Pairs were reminded to discuss with their partner and agree on what to write before writing. While awaiting response from the opposing pair, dyads were asked to work on a sheet designed to promote reflection on the dialogs. These were of two forms alternating across sessions (Appendix B). One asked the pair to identify and reflect on the opponents’ main argument. The other asked them to reflect on their own side’s main argument. Students from both ER + AP and AP conditions completed these dialog reflection sheets. During each session, dyads were provided on a small index card a short piece of information in question-and-answer format of the nature illustrated in Appendix A with respect to the juvenile justice assessment topic. Pieces of evidence provided were those used in our previous studies. Students were told that these cards contained information that might be useful to them in their dialogs but that they were not required to use it. The information was of types serving four potential functions, with one piece distributed during each of the five dialog sessions following this sequence:

2.3.5. Session 9 The final activity for each topic cycle was students’ individual written composition of a “letter to a newspaper editor” on the topic. Students were told that the goal of their writing was to try to persuade readers that their position was the better one. The information they had received on the topic remained available to them (Appendix C) and the same verbal prompt as given at the initial pre-assessment was repeated. 2.3.6. Condition manipulation The procedure for the AP (Argument Practice) condition was as described above. The ER + AP (Evidence Reflection plus Argument Practice) condition was identical except in one respect. In addition to the reflection sheet described earlier, dyads collaboratively completed another activity designed to focus reflection on the nature of evidence and its coordination with claims (see Tables 3 and 4). Part A was completed at the beginning of each dialog session and focused on anticipation of potential evidence use. In Part A, as shown in Table 3, the dyad was first prompted to reflect on whether there existed a meaningful conflict between their claim and the evidence. Students were then asked to envision a counterargument to the evidence and to predict how someone disagreeing with them would use the evidence. These questions were designed to help students gain proficiency in coordinating evidence with different positions. Part B was completed toward the end of each dialog session and involved evaluation of students’ actual evidence use. As shown in Table 4, students were first asked to recap how they coordinated claim with evidence. They were then asked a set of evaluative questions designed to help them construct evaluative criteria to assess evidence use and to improve on such use. In completing these activities, students were instructed to discuss fully with their partner and reach agreement before recording an answer.

1. Information having potential to serve as evidence supporting pair’s favored position (M+) 2. Information having potential to serve as evidence weakening the opposing position (O−) 3. Information having potential to serve as evidence supporting the opposing position (O+) 4. Information having potential to serve as evidence weakening the favored position (M−) 5. Second instance of type 1 (M+) Evidence was presented corresponding to the cognitive demand it posed. The support-own (M+) type was presented first as previous research found it to be easiest for students (Kuhn et al., 2016a, 2016b). The next piece of evidence given was the weaken-other (O−) type, which posed a greater challenge given novice students tend to neglect the opposing position. The most challenging types of evidence are those incongruent with individual’s own position (support-other O+ and weaken-own M−). They were presented during the next two sessions. In the fifth session, students encountered a second piece of support-own evidence. Students were also encouraged to submit questions regarding the topic that they would like answered. The research team provided answers to these student-generated questions and they were made available to all students at the following session. A Chinese-speaking

2.4. Coding of essays A native Chinese-speaking individual who had recently obtained a Master’s degree in Education in the U.S. was recruited to assist in to establish inter-rater reliability in the coding of essays. Blind to condition, the author and this individual independently coded a randomly selected 30% of the essay dataset. 5

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

O− units were collapsed into claim-congruent units and O+ and M− units were collapsed into claim-incongruent units. Fig. 1 provides a diagrammatic representation of the full coding scheme. This coding scheme also included as an additional super-category the occurrence of However units, defined as two adjacent units serving opposing functions and connected to one another, e.g., a support-other unit connected to a weaken-other unit or support-own unit to a supportother unit. For instance, the following statement was coded as a However unit, “Even though a lot of people consider rats as useless creatures, I think every creature that continues to exist over the past thousands or millions of years does so for a good reason.” If a However unit (consisting of two adjacent simple units) made reference to evidence, it was coded as an evidence-based However unit. Of these, those that succeeded in coordinating two pieces of evidence serving contrasting functions were coded as full evidence-based However units. The following example illustrates such a unit: “The other side said we can use human bodies for research. However, a lot of research involving living bodies is so complicated that they could only be carried out with animals.” Note that both parts made reference to evidence. Two coders independently identified and coded the same set of essays used earlier for evidence-related However arguments and achieved an inter-rater agreement of 97%, Cohen’s kappa = 0.88, p < .0005. Differences were resolved through discussion and the author proceeded to code the remaining essays.

Table 3 Evidence reflection activity Part A for the ER + AP condition. Evidence reflection activity – Part A

Instructions: the following questions will help you think about the evidence you have seen today. Please discuss with your partner before answering each question. 1. Have you heard this evidence from your opponents before? (Please circle one) Yes No Not sure 2. Does the evidence help your side or your opponent’s side? (Please circle one) Help my side Help opponent’s side Not sure If it helps your side, answer questions 3 & 4. If it helps your opponent’s side, answer questions 4 & 5. If you are not sure, answer question 4. 3. Since this evidence helps your side, is there anything your opponents might say against it? 4. How might someone who disagree with you on this issue use this evidence? 5. Since this evidence helps your opponent’s side, is there anything you can say against it?

Table 4 Evidence reflection activity Part B for the ER + AP condition. Evidence reflection activity – Part B

1. Have you addressed this evidence today? If yes, what argument did you make with it? If no, why? 2. Are you satisfied with the way you and your partner used the evidence today? Satisfied So-so Not satisfied 3. If you are satisfied, explain why you are satisfied. If you are not satisfied, why? Can you think of a way to improve how you used it?

The two coders segmented each essay, reached an agreement of 88% for number of idea units an essay contained, resolving differences through discussion. An idea unit was further coded as evidence-based if it included reference to evidence, or non-evidence-based if it did not. The evidence referred to could be based on the information that had been provided by the research team or on their own general knowledge. Sometimes, a student merely inserted mention of a piece of evidence in the essay without connecting it to a specific claim or to any identifiable argument being made, in which case the unit was coded as non-functional (Hemberger et al., 2017). Table 5 contains definitions of each of the categories along with translated verbatim examples. Given our focus is on the argumentive function these evidencebased units serve, only functional evidence-based units, i.e., those that successfully drew on evidence to support or weaken a claim, were included in the subsequent analysis. Each such unit was further categorized based on its argumentive function: support own claim (M+), weaken opposing claim (O−), support opposing claim (O+) or weaken own claim (M−). Table 6 presents translated verbatim examples of functional evidence-based units in each category. Two coders achieved an inter-rater agreement of 89% in assigning each idea unit to one of six categories (non-evidence-based, non-functional, and the four functional categories), Cohen’s kappa = . 82, p < .005. Discrepancies were resolved through discussion and the author proceeded to code remaining essays. In later analyses, M+ and

3. Results Because we regard the dyadic activity that took place among intervention students as part of the treatment, we used as the main outcome variable and unit of analysis a student’s individual essay. However, to assess the extent to which writings of students who worked as a dyad resembled each other, we used the variance components model to assess intraclass correlation (ICC) for each of the outcome measures. Since dyad composition changed from topic to topic, we examined intraclass correlation within each of the three topics. For Topic 1, the ICC values for each of the outcome measures ranged between 0 and 0.16. For Topic 2, the ICC values ranged between 0 and 0.25. For Topic 3, the ICC values ranged between 0 and 0.25. Thus, we conclude that intraclass correlation within dyads was not large enough to be a significant concern with respect to independence. 3.1. Intervention topic essays Our first analyses examine the essays students wrote during the intervention as the culminating activity for each of the three topics. The percentage of functional evidence-based units was high, exceeding 90% of all evidence-based units for all three topics. The following analyses were based on these functional evidence-based units. A Generalized

Table 5 Coding scheme for categorizing evidence-based units (adapted from Hemberger et al., 2017). Level

Category definition

Verbatim (translated) examples*

Functional evidence-based units

Explicitly and successfully used evidence in service of a claim

Selling of kidney should be legalized because we do not have enough kidneys at the moment. In fact, in China in 2012, for every 150 people waiting for a donated kidney, 149 die while waiting.

Nonfunctional evidence-based units

Evidence mischaracterized in a way that substantively misrepresents its meaning Missing connection of evidence to claim (usually partial or complete copy of evidence) Attempted but unsuccessful connection of evidence to claim

We don’t need to legalize the kidney sale market because we don’t have that many patients who need a kidney. In 2012, we only have 150 people who are waiting for a donated kidney. Statistics show that in China in 2012, for every 150 people waiting for a donated kidney, 149 die while waiting. The information says that 149 out of every 150 people die waiting for a donated kidney. The information suggests that kidney is just a commodity.

Q: Do people die because they can't get a new kidney in time? A: Yes, statistics show that in China in 2012, for every 150 people waiting for a donated kidney, 149 die while waiting. * Note. All the examples refer to the following piece of evidence in Topic 3. 6

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

Table 6 Verbatim (translated) examples of functional evidence-based units serving different argumentive goals. Function

Position: Allow Animal Testing

Position: Prohibit Animal Testing

Support own side (M+)

Data show that drugs that come out of animal testing can be used on sick animals as well. So this is a win–win situation in which both humans and animals benefit from animal testing. [Q: Can medical testing of animals be of any benefit to animals? A: Many of the medications that are given to sick animals (such as pets and zoo animals) were discovered as a result of medical research with humans that involved those animals.]*

If we don’t use animals for testing, we could simply use dead bodies to determine the cause of death and to help us improve our drugs.

Weaken opposing side (O−)

If we don’t allow animal testing, then a lot of advanced studies such as those involving gene engineering can’t be carried out. [Q: Are there any types of research that could be performed with animals but not humans? A: Many studies of living bodies are so complicated and uncertain that they could only be carried out with animals. For example, studies in gene engineering test how to modify the organs of animals so they can be transplanted to humans.]

According to the statistics of the Food and Drug Administration, 92 out of every 100 drugs that pass animals tests fail in humans. The low success rate means we waste too animals on producing drugs that do not actually improve our drugs. [Q: Do most of the drugs that pass animal tests succeed in humans? A: The Food and Drug Administration reports that 92 out of every 100 drugs that pass animal tests fail in humans.]

Support opposing side (O+)

It is true that there are alternatives to using animals in research. For instance, synthetic human skin be can be used to test the effects of sunscreen. [Q: Can synthetic versions of human organs be used in research? A: Studies involving the effect of sunscreen on a material like human skin gave quick results, compared to the length of time required for animal testing.]

Admittedly, there are laws in place to limit what scientists could do with animals in their laboratories.

The opposing side made a valid point that not all drugs that pass animal testing would be successful on humans [Q: Do most of the drugs that pass animal tests succeed in humans? A: The Food and Drug Administration reports that 92 out of every 100 drugs that pass animal tests fail in humans.]

It is true that if scientists did not experiment on animals, we would not have the drugs available to treat diabetes, hepatitis, polios, and AIDS. [Q: Has animal testing led to cures for any human diseases? A: Animal testing has led to treatments and cures for many human diseases. For example, research with dogs led to treatments for diabetes, and research with monkeys have led to treatments for hepatitis, polio, and AIDS.]

Weaken own side (M−)

[Q: Can bodies of humans who recently died be used for research? A: Examining human bodies soon after death can help to better understand causes and effects of diseases and medicines.]

[Q: How are animals treated in research laboratories? A: There are laws in place to help ensure that distress and pain in animals is kept to a minimum, but the daily treatment of animals is not known because the testing places cannot be monitored at all times and records are not shared.]

* Note. Evidence students used is shown in brackets.

Fig. 1. Diagrammatic representation of the coding scheme.

Linear Model (GLM) method with negative binomial regression was used to compare across topics by condition. Table 7 reports on percentages of participants showing at least one claim-congruent (M+ or O−) unit or claim-incongruent (O+ or M−) unit by condition and topic. Table 8 reports on mean number of such claims. As reflected in Tables 7 and 8, at Topic 1 there was no significant condition difference with respect to claim-congruent units, p = .651. The two conditions did differ, however, with respect to claim-

incongruent units. The ER + AP group was more likely to ever include such a unit (p = .004, Fisher’s Exact Test) and had an expected 4.00 (95% CI, 1.50 to 10.66) times more claim-incongruent units than the AP group, Wald χ2 (1) = 7.69, p = .006. At Topic 2, the two groups showed comparable performance and no differences reached significance. At Topic 3, however, although the proportion of students ever including a claim-congruent evidence-based unit did not differ significantly by condition (p = 1.00), the ER + AP 7

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

Table 7 Percentages of students who ever included functional evidence-based units of two types by condition and topic. Topic 1

Claim-congruent units Claim-incongruent units

Topic 2

Topic 3

ER + AP condition (N = 26)

AP condition (N = 27)

ER + AP condition (N = 27)

AP condition (N = 27)

ER + AP condition (N = 26)

AP condition (N = 27)

89% 62%

92% 19%

100% 78%

93% 56%

96% 81%

96% 33%

Note. AP = Argument Practice condition; ER + AP = Evidence Reflection plus Argument Practice condition.

group had an expected 1.52 (95% CI, 1.15–2.01) times more claimcongruent evidence-based units than the AP group, Wald χ2 (1) = 8.53, p = .003. Moreover, for claim-incongruent units, the ER + AP group had an expected 4.47 (95% CI, 2.24–8.89) times more claim-incongruent units than the AP condition, Wald χ2 (1) = 18.17, p < .0005, and the proportion of students who ever included such a unit differed by condition, p = 0.001. Because the above analyses did not control for essay length, an additional set of analyses were performed that takes into account total number of idea units an essay contained. Doing so we believe also controlled to a certain extent topic effects as such effect would be manifested in the number of idea units students wrote for that topic, with students likely to write more for easier topics and less for more challenging topics. Means varied across topics, but differences did not reach significance across conditions within a topic. At Topic 1, the mean number of idea units was 15.85 (SD = 4.21) for the ER + AP condition and 17.19 (SD = 4.27) for the AP condition. At Topic 2, the mean number of idea units was 14.67 (SD = 3.61) for the ER + AP condition and 15.81 (SD = 5.86) for the AP condition. At Topic 3, the mean number of idea units was 12.42 (SD = 3.51) for the ER + AP condition and 13.44 (SD = 4.49) for the AP condition. Negative binomial regression indicated that none of the condition differences within topics reached statistical significance. In the analyses that control for essay length, we calculated each essay’s percentage of claim-congruent units and percentage of claimincongruent units, out of total idea units. These percentages were subjected to a two-way mixed analysis of variance (ANOVA), with condition as a between-subjects factor, topic as a within-subjects factor, and percentage as outcome variable. For the first outcome variable, claim-congruent evidence-based units, there was a significant two-way interaction between condition and topic, F (2, 98) = 3.21, p = .045, Partial η2 = 0.10, a medium effect size (Cohen, 1988). As shown in Fig. 2, significant simple effects for condition were detected at Topic 2 and Topic 3. At Topic 2, the mean percentage was 36% (SD = 0.11) for the ER + AP condition and 28% (SD = 0.14) for the AP condition, F (1, 52) = 4.85, p = .032, Partial η2 = 0.09, a medium effect size. At Topic 3, the mean percentage was 38% (SD = 0.14) for the ER + AP condition and 25% (SD = 0.14) for the AP condition, F (1, 51) = 11.97, p = .001, Partial η2 = 0.19, a large effect size. Examination of simple effects for topic was also carried out to detect change over time. For the ER + AP condition, there was a statistically significant effect of topic, F (2, 48) = 13.57, p < 0.0005, Partial η2 = 0.40, a large effect size. For the AP condition, the effect of topic approached significance, F (2, 50) = 3.13, p = .052, Partial η2 = 0.11.

Fig. 2. Mean percentage of idea units coded as claim-congruent and claim-incongruent by condition and topic. Note. AP = Argument Practice condition; ER + AP = Evidence Reflection plus Argument Practice condition. Con = Claim-congruent evidence-based units, Incon = Claim-incongruent evidence-based units. *p < .05, **p < .01, ***p < .001.

For claim-incongruent units, there was also a two-way interaction between condition and topic, F (2, 98) = 7.60, p = .001, Partial η2 = 0.17, a large effect size. As shown in Fig. 2, significant simple effects for condition were detected at all three topics. At Topic 1, the mean percentage was 5% (SD = 0.05) for the ER + AP condition and 1% (SD = 0.03) for the AP condition, F (1, 50) = 11.63, p = 0.001, Partial η2 = 0.19, a large effect size. At Topic 2, the mean percentage was 9% (SD = 0.07) for the ER + AP condition and 6% (SD = 0.06) for the AP condition, F (1, 52) = 4.37, p = .042, Partial η2 = 0.08, a medium effect size. At Topic 3, the mean percentage was 13% (SD = 0.09) for the ER + AP condition and 2% (SD = 0.04) condition, F (1, 51) = 30.96, p < 0.0005, Partial η2 = 0.38, a large effect size. Regarding change over time, for the ER + AP condition there was a significant effect of topic, F (2, 48) = 9.78, p < .0005, Partial η2 = 0.29, a large effect size. For the AP condition, the effect of topic, although of a decreasing form, was also significant, F (2, 50) = 8.48, p = .001, Partial η2 = 0.25, a large effect size. Overall, analyses that took into account total idea units detected more significant differences than the previously presented analyses

Table 8 Means (and SDs) of functional evidence-based units of two types by condition and topic. Topic 1

Claim-congruent units Claim-incongruent units

Topic 2

Topic 3

ER + AP condition

AP condition

ER + AP condition

AP condition

ER + AP condition

AP condition

3.27 (1.89) 0.77 (0.76)

3.50 (2.61) 0.19 (0.40)

5.11 (1.76) 1.41 (1.19)

4.41 (2.24) 1.00 1.18)

4.62 (1.94) 1.65 (1.29)

3.04 (1.45) 0.37 (0.56)

Note. AP = Argument Practice condition; ER + AP = Evidence Reflection plus Argument Practice condition. 8

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

Table 9 Percentages of students whose essay ever contained a however unit by condition and topic. Topic 1

Claim-congruent units Claim-incongruent units

Topic 2

Topic 3

ER + AP condition

AP condition

ER + AP condition

AP condition

ER + AP condition

AP condition

73% 8%

39% 4%

83% 63%

78% 33%

85% 65%

52% 22%

Note. AP = Argument Practice condition; ER + AP = Evidence Reflection plus Argument Practice condition.

based on the frequencies of evidence-based units, with effect sizes ranging from medium to large. Particularly, while frequency analysis did not detect a significant difference between the two conditions for Topic 2, percentage analyses showed significant differences by condition for both claim-congruent and claim-incongruent units for Topic 2. The two sets of analysis converged in showing that at Topic 3, the ER + AP condition showed a decisive advantage over the AP condition. Lastly, only the ER + AP condition showed significant progress over time for both claim-congruent and claim-incongruent units. In addition, we examined occurrence of evidence-based However units in essays. As seen in Table 9, at Topic 1, more ER + AP students than AP students ever included an evidence-based However unit, p = .012. Frequencies are shown in Table 10. At Topic 1, the ER + AP condition had an expected 2.59 (95% CI, 1.32–5.06) times more evidence-based However units than the AP condition, a significant result, Wald χ2 (1) = 11.09, p = 0.005. The mean number of full evidencebased However units was negligible for both conditions at Topic 1, and the difference did not reach statistical significance. At Topic 2, the two conditions did not show a significant difference in the appearance or mean number of evidence-based However units. At Topic 3, however, as reflected in Table 9, more ER + AP students than AP students ever included an evidence-based However unit, p = .011. Regarding frequencies (Table 10), the ER + AP condition had an expected 4.21 (95% CI, 2.51–7.05) times more evidence-based However units than the AP condition at Topic 3, a significant result, Wald χ2 (1) = 29.85, p < .0005. Also a significant result, Wald χ2 (1) = 20.67, p < .0005, and in contrast to Topics 1 and 2, the ER + AP condition at Topic 3 had an expected 6.38 (95% CI, 2.86–14.22) times more full evidence-based However units than the AP condition. The proportion of students who ever showed a full evidence-based However unit was also significantly greater in the ER + AP condition at Topic 3, p = .002. In conclusion, by the last intervention topic (Topic 3), the ER + AP condition showed a significant advantage in constructing evidence-based However units and full evidence-based However units over the AP condition.

evidence-based However units. None of these analyses revealed significant condition differences. Condition as a predictor variable was also used to compare performance across conditions at posttest. We also controlled for pretest performance by treating it as a covariate. The mean number of idea units at posttest was 8.58 (SD = 2.12) units for the ER + AP condition, 9.30 (SD = 3.11) units for the AP condition, and 9.34 (SD = 3.37) units for the Control condition. The mean number of idea units at posttest unsurprisingly was substantially lower than those for the three intervention topics because students did not have the benefit of extended engagement with the non-intervention topic that they did for intervention topics. After adjusting for pretest performance, there was no significant difference by condition in the mean number of idea units at posttest, Wald χ2 (2) = 0.58, p = .749. For claim-congruent evidence-based units, the mean number was 5.78 (SD = 2.59) units for the ER + AP condition, 3.52 (SD = 1.55) units for the AP condition, and 2.38 (SD = 1.41) units for the Control condition, a significant difference across the three conditions, Wald χ2 (2) = 55.05, p < .0005. The ER + AP condition showed an expected 1.64 (95% CI, 1.27–2.11) times more claim-congruent evidence-based units than the AP condition, a significant result, Wald χ2 (1) = 14.13, p < .0005. The AP condition showed an expected 1.51 (95% CI, 1.15–1.98) times more claim-congruent evidence-based units than the Control condition, a significant result, Wald χ2 (1) = 8.92, p = .003. For claim-incongruent evidence-based units, the mean number was 1.04 (SD = 0.85) units for the ER + AP condition, 0.48 (SD = 0.64) units for the AP condition, and 0.36 (SD = 0.60) units for the Control condition, a significant difference across the three conditions, Wald χ2 (2) = 13.29, p = .001. The ER + AP condition showed an expected 2.18 (95% CI, 1.13–4.23) times more claim-incongruent functional evidence-based units than the AP condition, a significant result, Wald χ2 (1) = 5.33, p = .02. The AP condition and Control condition did not differ significantly, Wald χ2 (1) = 0.56, p = .455. For evidence-based However units, the mean number was 1.56 (SD = 1.19) units for the ER + AP condition, 0.56 (SD = 0.70) units for the AP condition, and 0.44 (SD = 0.70) units for the Control condition, a significant difference across conditions, Wald χ2 (2) = 27.93, p < .0005. The ER + AP condition had an expected 2.89 (95% CI, 1.60 to 5.23) times more evidence-based However units than the AP condition, a significant result, Wald χ2 (1) = 12.36, p < .0005. The AP condition and Control condition did not differ significantly, Wald χ2 (1) = 0.41, p = .521. For full evidence-based However units, the mean number was 0.56

3.2. Pre- and posttest essay comparisons A negative binomial regression with condition as the predictor variable was carried out for pretest essays to establish initial equivalence across the three conditions. Examined were number of idea units, claim-congruent functional evidence-based units, claim-incongruent functional evidence-based units, evidence-based However units, and full Table 10 Means (and SDs) of however units by condition and topic. Topic 1

Evidence-based However unit Full evidence-based However unit

Topic 2

Topic 3

ER + AP condition

AP condition

ER + AP condition

AP condition

ER + AP condition

AP condition

1.69 (1.52) 0.08 (0.27)

0.65 (1.13) 0.04 (0.02)

2.19 (2.04) 0.89 (0.93)

2.67 (2.87) 0.70 (1.32)

2.81 (2.19) 1.65 (1.57)

0.67 (0.88) 0.26 (0.53)

Note. AP = Argument Practice condition; ER + AP = Evidence Reflection plus Argument Practice condition. Full evidence-based However units are those that succeeded in coordinating two pieces of evidence serving contrasting functions. 9

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

Table 11 Percentages of students who ever made certain types of claim by condition at pretest and posttest. Pretest

Claim-congruent units Claim-incongruent units Evidence-based However unit Full evidence-based However unit

Posttest

ER + AP condition (N = 26)

AP condition (N = 27)

Control condition (N = 50)

ER + AP condition (N = 27)

AP condition (N = 27)

Control condition (N = 50)

100% 23% 27% 8%

93% 37% 33% 15%

96% 20% 26% 6%

100% 70% 74% 41%

93% 41% 44% 15%

88% 30% 32% 18%

(SD = 0.80) units for the ER + AP condition, 0.15 (SD = 0.36) units for the AP condition, and 0.20 (SD = 0.45) units for the Control condition, a significant difference across conditions, Wald χ2 (2) = 9.83, p = .007. The ER + AP condition had an expected 3.98 (95% CI, 1.31 to 12.05) times more full evidence-based However units than the AP condition, a significant result, Wald χ2 (1) = 5.97, p = .015. The AP condition and Control conditions did not differ significantly, Wald χ2 (1) = 0.30, p = .583. Results for proportion of students ever showing these types appear in Table 11. For the ER + AP condition, pretest-posttest differences reached significance for claim-incongruent units (McNemar’s test, p = .002), evidence-based However units (p = .002), and full evidencebased However units (p = .012). No pretest-posttest differences reached significance in the AP and Control conditions for any variable.

had addressed this evidence during dialogs and, if not, why. For all topics, for claim-congruent evidence, the majority of dyads (93% at Topic 1, 100% at Topics 2 & 3) said they had done so. For claim-incongruent evidence, however, 89% of responses at Topic 1 indicated that they had not addressed it because this evidence would not help them. This percentage decreased to 64% at Topic 2 and further decreased to 50% at Topic 3. Participants were also asked to indicate and explain their level of satisfaction (or dissatisfaction) with this use. Students were initially highly satisfied with their use of claim-congruent evidence but became less satisfied as time passed. In contrast, pairs were dissatisfied with their use of claim-incongruent evidence but became more satisfied over time. Table 12 presents the percentage of pairs who said they used each type of evidence and those who selected the option of “satisfied.” For claim-congruent evidence, a typical reason for satisfaction was that “the evidence provided concrete support for my reason so it makes it difficult for the opposing side to counter it.” Over time, as students became less satisfied, a typical reason for dissatisfaction was “our evidence was easily countered by our opponents, indicating that students had begun to anticipate counters to their evidence-based claims. For claim-incongruent evidence, a typical reason for dissatisfaction at the beginning was “the evidence does not help us, so it actually makes us weak.” As satisfaction increased over time, a typical reason for satisfaction was “we must be really strong if we are able to counter a piece of evidence that does not help us.” Students thus had begun to recognize a need to address all evidence, not just supporting evidence.

3.3. Evidence reflection The only difference between the intervention experience in the AP and the ER + AP conditions was the additional Evidence reflection activity engaged in by the ER + AP group. An analysis of ER + AP students’ work during this activity, in the form of their answers to the Evidence Reflection Activity Part A and Part B (Tables 3 and 4), was undertaken for the purpose of gaining insight regarding how this component of the intervention may have contributed to the greater gains observed in this condition. In these analyses, we combined data for dialog 1 (M+ evidence) and 2 (O− evidence), involving claimcongruent evidence, and data for dialog 3 (O+ evidence) and 4 (M− evidence), involving claim-incongruent evidence. A change over time appeared in dyads’ responses to the question “How might someone who disagree with you on the issue use this evidence?” The change was most noticeable for claim-congruent evidence (which constituted claim-incongruent evidence for the opposing side). During Topic 1, 79% of responses were to the effect that the opposing side would not use this evidence because it would not help them. This percentage reduced to 50% at Topic 2 and to 43% at Topic 3. We further examined how students thought the opposing side would address this evidence. Of 12 such responses, seven proposed a specific counterargument to this piece of evidence, three said the opposing side could use another unspecified piece of evidence to counter it and two said that the other side could search for more information that would contradict it. Responses were also solicited regarding whether they themselves

4. Discussion Argumentation increasingly has been identified as a core component of classroom activities across the curriculum, yet it is argumentive writing that remains the greatest concern for educators of students of all ages (Common Core State Standards Initiative, 2010). This is also the case for the Chinese student population studied in the present work, as writing remains a staple in Chinese academic assessments (Curriculum Standards for Compulsory Education in). Overall, the intervention examined in the present work proved successful. The added component of engaging student dyads in collaborative meta-level reflection had significant effects on students’ argumentive writing, in particular with respect to addressing evidence both congruent and incongruent with a claim. Comparison of final essays of students in the two intervention conditions demonstrated that the specific reflection on relations

Table 12 ER + AP group’s responses in reflecting on evidence use. Topic

Evidence type

Percentage of dyads who said they used the evidence

Percentage of dyads who used the evidence and said they were satisfied with use

Topic 1

Claim-congruent Claim-incongruent

89% 11%

100% 0%

Topic 2

Claim-congruent Claim-incongruent

93% 36%

80% 40%

Topic 3

Claim-congruent Claim-incongruent

100% 50%

57% 50%

10

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

between claims and evidence in the ER + AP condition supported progress in this critical aspect of argument, in particular with respect to incongruent evidence. While participants in the ER + AP group increased consistently in addressing evidence of both congruent and incongruent forms from Topic 1 to Topic 3, the performance of participants in the AP group progressed only from Topic 1 to Topic 2 and then diminished at Topic 3. By Topic 3, ER + AP students showed a significant advantage over AP students in addressing both types of evidence. Therefore, the added reflective component reduced students’ my-side bias of exclusively attending to position-consistent evidence. We believe this is the case because the reflective activities specifically prompted students to coordinate evidence with various, contrasting positions. Anecdotal evidence suggested that Topic 2, animal research, was a topic for which students showed a particularly high level of interest and engagement. Students were reported to have spontaneously consulted their biology teacher on this issue multiple times, both in and out of their biology class. There was no indication of anything similar having occurred with respect to the other two topics. In this respect we observed a possible interaction between topic interest and treatment effect. When interest and engagement with a topic was high, as in Topic 2, students in the two conditions showed similar performance. In the case of a more challenging topic involving less familiar content, such as Topic 3 (kidney sales), the additional reflection benefitted students in the ER + AP condition but was less important when familiarity was greater. An alternative interpretation, however, is an accruing effect of the reflective exercise over time that occurred only in the ER + AP condition. Independent measures of topic interest could help to distinguish these two interpretations, and the interaction between topic interest and intervention effect would be a fruitful avenue for future exploration. Regarding our second research question, transfer of skills to a novel topic, AP students outperformed Control students in addressing the less challenging claim-congruent evidence, demonstrating at least partial effectiveness of the overall intervention, in line with previous studies of this curriculum (see Kuhn et al., 2016b, for review). However, students in the ER + AP condition outperformed those in the other two conditions in addressing both types of evidence and therefore achieving superior skill in the coordination of claims and evidence. This positive transfer is consistent with the literature on metacognitive reflection as well as with Nussbaum and Asterhan (2016) “proactive executive control strategies (PECS)” framework. Applying this framework to the present findings, reflective activities in the present study played multiple potential roles. Students worked with a partner, with whom they collaborated over a number of sessions, externalizing the need to engage in reflective planning and review. The fact participants worked with the same partner across multiple sessions allowed partners to develop a collaborative relationship that has been shown to lead to greater gains, relative to collaboration with a series of different partners (Zillmer & Kuhn, 2018). It is proposed here that reflective exercises helped students develop strategic skills regarding how to successfully coordinate evidence with various positions. They also helped students develop meta-level awareness of the need to address, rather than ignore, claim-incongruent evidence. Therefore, the added joint reflective activity potentially facilitated development at both strategic and meta-strategic levels. The transfer results are also consistent with a sociocultural account (Vygotsky, 1978). Students participating in collaborative reflection can potentially extend this reflection to the individual plane when writing individually, asking themselves questions such as “Does the evidence support my side or other side?”, “How might I counter this piece of evidence that appears unfavorable to my side?” and “Am I satisfied with my use of evidence or can I improve it?” These exercises thus helped students develop a reflective “habit of mind,” which carried over from a dyadic to individual context. However, not all skills successfully extended to a non-intervention

topic. Comparing the performance of the ER + AP condition between the final intervention topic (Topic 3) and the transfer topic, we see a decrease in the mean frequency of evidence-based However statements (from 2.81 units to 1.56 units) and full evidence-based However statements (from 1.65 units to 0.56 units). The percentage of students ever making a full evidence-based However statement decreased from 65% at Topic 3 to 41% at transfer topic. Thus, even though meta-level experience supported extension of skills to a less familiar topic, students’ lack of extended dialogic engagement with the topic diminished the strong presence of the dialogic structure, reflected in the reduced frequency of evidence-based However statements in essays, relative to essays for the final intervention topic. Regarding our third research question, addressed to mechanism, an interesting developmental trend emerged in students’ answers to the evidence reflection prompts. Dyads were highly satisfied with their use of claim-congruent evidence at Topic 1 (100% of responses) but grew increasingly dissatisfied over time (57% of responses). At the same time, they were highly dissatisfied with their use of claim-incongruent evidence at Topic 1 (0% of responses), but grew increasingly satisfied over time (50% of responses). The trend suggested that dyads became increasingly critical of their use of claim-congruent evidence, possibly because their opponents countered their evidence-based claim, which prompted them to recognize its weaknesses. At the same time, they became more confident of their attention to claim-incongruent evidence, and were thus more inclined to attempt to address it. In what ways did students actually coordinate a claim with evidence incongruent with it? A number of forms appeared. Often evidence is introduced in an attempt to weaken a claim supportive of the opposing side, as the following example shows: The other side says that animal research is too cruel. However, what I want to say is that research shows that a lot of drugs that help to treat diseased animals come from research on animals. This means that animal research isn’t all bad. Or, the writer could acknowledge a piece of evidence supportive of the opposing side and then challenge its implications: The other side says that not all drugs that pass the animal test are successful on humans. While this is true, I think even though the success rate isn’t that high, one success still means a lot. Some students were able to coordinate multiple pieces of conflicting evidence in one However structure: Even though the opposing side has cited that in 2016, over 100,000 people registered to be organ donors, a threefold increase over 2015. The reality is that in China, there is about a million people on a waiting list to receive kidney transplants. However, only about 10,000 kidney transplant surgeries are carried out each year. Cross-cultural psychologists (Peng & Nisbett, 1999; SpencerRodgers, Williams, & Peng, 2010; Zhang & Chen, 1991) have pointed out that Chinese people tend to deal with seeming contradictions in a dialectical or compromise approach, by seeking a “middle way.” The However structure coded in the present study can be viewed as a reflection of dialectical thinking. To successfully construct a However argument, one has to first recognize a contradiction and then seek to reconcile it. In the present study, students in the ER + AP group attained their best performance in using evidence to construct However arguments in their essays on the last intervention topic (Topic 3), with 85% of students doing so at least once. Comparable performance remained low for the AP and Control conditions, indicating that despite the implicit naïve dialecticism present in their culture, Chinese students still stand to benefit from educational experiences designed to promote more explicit dialectical modes in their interpersonal communication as well as in in their individual thinking and writing. Among the limitations of the present study, one is that it was carried out in a selective middle school with relatively high student academic 11

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

achievement. Future studies need to investigate whether similar benefits would be observed among students from a wider range of socioeconomic and academic backgrounds in China. Another limitation is that we did not independently measure students’ topic interest, as noted earlier. Nor do we know much about how students self-regulated in meta-level coordination of claims and evidence in their individual essay writing. We only hypothesized that joint reflection would be transferred to the individual plane. In future studies, we could invite students to “think aloud” while they read through and use evidence when writing an essay. Turning finally to the cultural issues the present work raises, it is notable that our discourse-based educational approach has been met with considerable success in a non-Western Culture. Previous work has been largely confined to culturally Western contexts (Henrich, Heine, & Norenzayan, 2010), cautiously defined by Tweed and Lehman (2002) as Western English-speaking countries (mainly United States, Britain, Canada, Australia). Yet, about 70% of humans today live in non-Western cultures (Triandis, 1995). Therefore, an important question worthy of investigation is whether an approach of this sort is similarly suitable and would prove productive in a society traditionally viewed as limited in valuing and encouraging dissent. In recent years, breakthroughs have been made as discourse-based instruction has been implemented in Mainland China and Taiwan with great success (e.g., Cheng et al., 2015; Sun, Anderson, Perry, & Lin, 2017; Lin et al., 2019). However, the discourse type emphasized in these studies was collaborative discussion, whereas the one emphasized here was adversarial argumentation, which we believe posed a greater challenge for the Chinese student population studied. More studies are thus called for to further investigate the affordances and limitations in

introducing similar discourse-based argumentive curricula in East Asian contexts. The finding that Chinese students were able to adapt to and profit from the discourse-based argumentive curriculum implemented in this study is notable, despite any inclination to avoid confronting and engaging dissenting views that may be a dimension of their cultural history (Kuhn, Wang, & Li, 2011). Results such as the present ones thus challenge an increasingly outmoded stereotype of Chinese students as capable of gaining mastery only of teacher-provided knowledge. The Chinese philosopher Confucius is quoted as having said that “Study without reflection is a waste of time, and reflection without study is dangerous” (Analects, 1979, BOOK II). Teacher-led reflection on learning activities is in fact commonly seen and emphasized in Chinese classrooms (Jin & Cortazzi, 1998; Stigler & Perry, 1988) and was observed by the author in the present school. Engaging students in reflection in the present study is thus consistent with Confucius’ ideal of education, and exists as a part of modern Chinese school life. We end with the thought that perhaps the rather pessimistic tone expressed by Becker (1986) and other researchers (e.g., Nakamura, 1964) that Chinese students prefer self-debasing apology to logical explanations, and that Chinese debaters lack internal standards to evaluate arguments, is unfounded, or at least needs to be updated. The present study demonstrates that Chinese students are fully equipped, as least in the context of the present study, to participate in argumentive discourse and in fact, were observed to derive much satisfaction from such activities. If given the opportunity, they, too, can successfully engage in spirited give-and-take of opinions, and extend their newly acquired skills to an individual level.

Appendix A Evidence list for pre- and post-assessment essay 1. How does a juvenile court system differ from a regular one? The judges and staff in a juvenile system are specially trained to deal with young people who have committed crimes. 2. Are punishments for the same crime different in juvenile and adult courts? Yes, punishments tend to be less severe and sentences shorter in juvenile court. In 2006 in China, 70% of juveniles who committed serious crimes are subjected to sentences for less than 3 years in juvenile court. 3. At what age is the brain fully developed? The prefrontal cortex, which is responsible for abstract thinking and the ability to exercise good judgment, is not fully developed until one’s early- to mid20 s. 4. Would the government save money if they didn’t have to pay for a separate juvenile system? Yes, juvenile courts and prisons require more people to run and thus cost more. Adult courts cost less to operate. 5. Do juveniles ever commit violent crimes such as murder? Research shows an increase in violent crimes committed by juveniles in the past ten years in China. In particular, in 2012 in China, 5.1% of juvenilecommitted crimes were murders. 6. Do teens sentenced to jail time in juvenile court get jail records? They do not if sentences are served in a juvenile detention center; their records are sealed on their release. 7. Do people have better self-control as they get older? Not necessarily. But there is evidence that adults think more carefully before they act, compared to teens. 8. Are teens or adults more likely to repeat their crimes? 12

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

For teens convicted of a serious crime (a felony), the rate of recidivism (repeat crime) is 90% over 10 years. For crimes overall, it is about 50%. 9. What are public opinions on the juvenile court issue? A “get tough” policy has become more popular in recent years, with a law proposing that adolescents as young as 16 are tried in regular adult court. 10. Are teens at risk of being assaulted in adult prisons? Yes. Teens in adult jails are 50% more likely to be attacked by another inmate and twice as likely by prison staff, compared to adult prisoners. 11. Do adult jails provide job training? Many do. In 2016, the majority of adult prisoners in China engage in labor reform which might train their job-related skills. Appendix B Argument reflection sheets

13

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

Appendix C Evidence lists for intervention topic writing Topic 1 1. Does high school require a lot of time devoted to schoolwork? Most teachers who have taught high school in China say that high school curriculum can be very demanding and students often have a lot of homework after school. 2. Do students need work experience before they can decide what career they want to prepare for? Studies have shown that students who have work experience develop better ideas of what they want to do in the future.

14

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

3. Can teenagers learn bad habits in work settings? Yes, some parents of teenagers who have worked part-time noticed that their kids started to imitate bad habits of older people they work with. 4. Do all high school students want to pursue an academic track leading to college? No, a recent opinion poll shows that some Chinese high school students want to prepare for jobs rather than more schooling. 5. How long are the summer and winter breaks for high school students in China? While different schools have different break schedules, most high schools schedule extra school time during breaks, so the length of their summer and winter breaks might get cut in half. 6. What do parents from Western countries (such as the US) think about part-time jobs for their children? While different Western parents hold different opinions, statistics have shown that the percentage of Western high school students who hold a part-time job is higher than that of Chinese high school students. 7. What kind of part-time jobs can high school students take in China? There is a range of jobs available to high school students. Some work in the service industry like a cashier. Some provide tutoring to younger students. In recent years, companies such as Google offer paid internship opportunities for high school students. 8. How much can high school students earn when doing a part-time job? High school students and adults who work in the same sector in the service industry earn about the same. In 2017, the average monthly wage of a full-time employee at Walmart is around 3000 RMB and the hourly wage is about 12 RMB. 9. Do high school students develop new useful skills when doing a part-time job? A: Yes, studies have shown that students who work part-time might be more likely to develop important skills than other students, such as being more accountable and enhanced ability to deal with an emergency situation. 10. Do high school students who spend more time on schoolwork have a better chance to get into a good college? Yes, given the competitive nature of the college entrance examination in China, historical data show that students who study more are more likely to score higher and have a better chance to get into a good college. 11. Can high school students who do a part-time job receive mistreatment? Yes that is possible. According to a recent research project in Japan, 70% of Japanese high school part-time workers have received mistreatment at least once, such as unpaid overtime work. 12. What is the yearly tuition and living expense for a high school student in China? With the increase in price over the past few years, the cost for attending high school has dramatically increased. For students enrolled in a private high school, their parents need to pay as much as 50,000 RMB per year for tuition and living expenses combined. Topic 2 1. How many animals are involved in medical research each year in the USA? The U.S. Department of Agriculture reports that 1.2 million animals were used in research 2005. This does not include rats and mice, which make up about 90% of animals used in research. 2. Can researchers use as many animals as they want in their research? Regulations exist that require that scientists use as few animals as possible to conduct their research. 3. Can bodies of humans who recently died be used for research? Examining human bodies soon after death can help to better understand causes and effects of diseases and medicines. 4. Why have animals been used in medical research? Animal organs often resemble human organs, so medicines may work in similar ways. 15

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

5. Do most of the drugs that pass animal tests succeed in humans? The Food and Drug Administration in America reports that 92 out of every 100 drugs that pass animal tests fail in humans. 6. Are there any types of research that could be performed with animals but not humans? Many studies of living bodies are so complicated and uncertain that they could only be carried out with animals. For example, studies in gene engineering test how to modify the organs of animals so they can be transplanted to humans. 7. Can synthetic versions of human organs be used in research? Studies involving the effect of sunscreen on a material like human skin gave quick results, compared to the length of time required for animal testing. 8. Has animal testing led to cures for any human diseases? Animal testing has led to treatments and cures for many human diseases. For example, research with dogs led to treatments for diabetes, and research with monkeys have led to treatments for hepatitis, polio, and AIDS. 9. How are animals treated in research laboratories? There are laws in place to help ensure that distress and pain in animals is kept to a minimum, but the daily treatment of animals is not known because the testing places cannot be monitored at all times and records are not shared. 10. How similar are humans and animals in terms of diseases they get? Many of the diseases that humans get—such as cancer, malaria, asthma, arthritis, and heart failure—are also found in animals. 11. Can statistics be used to analyze how people react to different life events? Statisticians have helped link cigarette smoke to lung cancer and diet to heart disease by studying large numbers of people over periods of time. 12. Can medical testing of animals be of any benefit to animals? Many of the medications that are given to sick animals (such as pets and zoo animals) were discovered as a result of medical research with humans that involved those animals. Topic 3 1. Do people die because they can't get a new kidney in time? Yes, statistics show that in China in 2012, for every 150 people waiting for a donated kidney, 149 die while waiting. 2. Have many people agree to donate a kidney? Currently, a certain percentage of Chinese choose to be organ donors. However, France has increased their donors to 99% by assuming that everyone wants to donate their organs unless they notify in writing that they don’t want to (this is called “opting out”). 3. Q: Do enough people volunteer to donate their kidneys for there to be enough kidneys to go around to those who need them? No. Currently in China there are about a million people on waiting list to receive kidney transplants. However, only about 10,000 kidney transplant surgeries are carried out each year. 4. How much do kidneys sell for? Economists estimate that kidneys can cost anywhere from 30,000 to 40,000 RMB if selling is allowed. This is more money than the average annual income of a family in China. 5. Will taking out one kidney negatively affect the health? No, the negative effect of having one kidney removed is very limited. Some people are even born with just one kidney and lead a normal life. As long as the other kidney is healthy, you are expected to have a normal life expectancy. 6. How many people volunteer to be organ donors in China? In 2016, over 100,000 people registered to be organ donors, a threefold increase over 2015. Since the Department of Health in China opened up online selfregistration in 2010, the number of registered donors has dramatically increased each year. 16

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi

7. How well does the legal kidney market work in Iran? Since Iran’s legalization of the kidney market in the mid-90 s, the need of kidney transplant has been met. The sellers and recipients would match their blood type and run other tests in large hospitals before the transplant surgery is carried out. 8. Do people who sell their kidneys need the money they receive? Yes. Almost always they are very poor and have few ways to earn money. For instance, some kidney sellers in Iran need the money to pay off debts. 9. What is the percentage of kidneys coming from the black market? According to the statistics in America, in 2010, 1/5 of the kidneys come from the black market. 10. Is it easy to make known your wish to donate your organs when you die? Very easy. In America, many states encourage donations by allowing the consent to be noted on a person’s driver’s license. In China, you simply need to go onto a website and register as an organ donor. 11. For patients with kidney failure, are there other treatment plans besides kidney transplant? Patients with kidney failure could stay on dialysis. However, dialysis is costly. A patient needs 2–3 procedures each week with each procedure costs around 800 RMB without insurance. 12. Can a kidney get transplanted from the body of someone who has died? Yes, if it is done quickly after death and the donor’s family agrees.

Korea. Reading Research Quarterly, 43, 400–424. Driver, R., Newton, P., & Osborne, J. (2000). Establishing the norms of scientific argumentation in classrooms. Science Education, 84(3), 287–312. Duncan, R. G., Chinn, C. A., & Barzilai, S. (2018). Grasp of evidence: Problematizing and expanding the next generation science standards’ conceptualization of evidence. Journal of Research in Science Teaching, 55(7), 907–937. Duschl, R. A., & Osborne, J. (2002). Supporting and promoting argumentation discourse in science education. Studies in Science Education, 38, 39–72. Edwards, K., & Smith, E. (1996). A disconfirmation bias in the evaluation of arguments. Attitude and social cognition, 71, 5–24. Erduran, S., Simon, S., & Osborne, J. (2004). TAPping into argumentation: Developments in the application of Toulmin’s argument pattern for studying science discourse. Science Education, 88(6), 915–933. Felton, M. (2004). The development of discourse strategies in adolescent argumentation. Cognitive Development, 19, 35–52. Felton, M., Crowell, A., & Liu, T. (2015). Arguing to agree: Mitigating my-side bias through consensus-seeking dialogue. Written Communication, 32(3), 317–331. Ford, M. (2012). A dialogic account of sense-making in scientific argumentation and reasoning. Cognition and Instruction, 30(3), 207–245. Graff, G. (2003). Clueless in academe: How schooling obscures the life of the mind. New Haven, CT: Yale University Press. Greene, J., Sandoval, W., & Bråten, I. (Eds.). (2016). Handbook of epistemic cognition. New York, NY: Routledge. Hemberger, L., Kuhn, D., Matos, F., & Shi, Y. (2017). A dialogic path to evidence-based argumentive writing. Journal of the Learning Sciences, 26(4), 575–607. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world. Behavioral and Brain Sciences, 33(2–3), 61–83. Hofer, B. K., & Pintrich, P. R. (1997). The development of epistemological theories: Beliefs about knowledge and knowing and their relation to learning. Review of Educational Research, 67(1), 88–140. Hofstede, G. (1986). Cultural differences in teaching and learning. International Journal of Intercultural Relations, 10(3), 301–320. Howe, C., & Abedin, M. (2013). Classroom dialogue: A systematic review across four decades of research. Cambridge Journal of Education, 43(3), 325–356. Iordanou, K. (2013). Developing face-to-face argumentation skills: Does arguing on the computer help? Journal of Cognition and Development, 14(2), 292–320. Iordanou, K. (2016). Developing epistemological understanding in scientific and social domains through argumentation. Zeitschrift für Pädagogische Psychologie, 30, 109–119. Iordanou, K., & Constantinou, C. (2015). Supporting use of evidence in argumentation through practice in argumentation and reflection in the context of SOCRATES learning environment. Science Education, 99(2), 282–311. Iordanou, K., Kuhn, D., Matos, F., Shi, Y., & Hemberger, L. (2019). Learning by arguing. Learning and Instruction. https://doi.org/10.1016/j.learninstruc.2019.05.004. Jiménez-Aleixandre, M. P., Rodríguez, A., & Duschl, R. A. (2000). “Doing the lesson” or “Doing science”: Argument in high school genetics. Science Education, 84(6), 757–792. Jin, L. X., & Cortazzi, M. (1998). The culture the learner brings: A bridge or a barrier? In M. Byram, & M. Fleming (Eds.). Language learning in intercultural perspective: Approaches through drama and ethnography (pp. 98–118). Cambridge: Cambridge

References Analects (1979). (D.C. Lau, Trans.). Harmondsworth, England: Penguin Books. Aoki, K. (2008). Confucius vs. Socrates: The impact of educational traditions of East and West in a global age. The International Journal of Learning, 14(11), 35–40. Applebee, A. (1996). Curriculum as conversation. Chicago, IL: University of Chicago Press. Baker, M. J. (2003). Computer-mediated interactions for the co-elaboration of scientific notions. In J. Andriessen, M. Baker, & D. Suthers (Eds.). Arguing to learn: Confronting cognitions in computer-supported collaborative learning environments. Utrecht, The Netherlands: Kluwer Academic. Baker, M. J. (2009). Intersubjective and intrasubjective rationalities in pedagogical debates. In B. B. Schwarz, T. Dreyfus, & R. Hershkowitz (Eds.). Transformation of knowledge through classroom interaction. New Perspectives on Learning and Instruction (pp. 145–158). New York, NY: Routledge. Baron, J. (2008). Thinking and deciding (4th ed.). Cambridge, MA: Cambridge University Press. Becker, C. B. (1986). Reasons for the lack of argumentation and debate in the far east. International Journal of Intercultural Relations, 10(1), 75–92. Bell, P., & Linn, M. C. (2000). Scientific arguments as learning artifacts: Designing for learning from the web with KIE. International Journal of Science Education, 22, 797–817. Berland, L. K., & Reiser, B. (2011). Classroom communities’ adaptations of the practice of scientific argumentation. Science Education, 95(2), 191–216. Brem, S. K., & Rips, L. J. (2000). Explanation and evidence in informal argument. Cognitive Science, 24, 573–604. Brown, A. (1997). Transforming schools into communities of thinking and learning about serious matters. American Psychologist, 52, 399–413. Cheng, Y., Zhang, J., Li, H., Anderson, R., Ding, F., Nguyen-Jahiel, K., ... Wu, X. (2015). Moving from recitation to open-format literature discussion in Chinese classrooms. Instructional Science, 43(6), 643–664. Chinn, C. A., & Brewer, W. (1993). The role of anomalous data in knowledge acquisition: A theoretical framework and implications for science instruction. Review of Educational Research, 63(1), 1–49. Clark, D. B., & Sampson, V. (2007). Personally-seeded discussions to scaffold online argumentation. International Journal of Science Education, 29(3), 253–277. Clarke, S., Resnick, L., & Rose, C. (2015). Dialogic instruction: A new frontier. In L. Corno, & E. Anderman (Eds.). Handbook of educational psychology. New York: Routledge. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York, NY: Routledge Academic. Common Core State Standards Initiative (2010). Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects. Retrieved from http://www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf. Crowell, A., & Kuhn, D. (2014). Developing dialogic argumentation skills: A three-year intervention study. Journal of Cognition and Development, 31, 456–496. Curriculum Standards for Compulsory Education in China (2011). Curriculum standards for Chinese Language Arts. Retrieved from http://mat1.gtimg.com/edu/pdf/edu/ xkb2011/20120130155433177.pdf. Dong, T., Anderson, R. C., Kim, I., & Li, Y. (2008). Collaborative reasoning in China and

17

Contemporary Educational Psychology 59 (2019) 101809

Y. Shi University Press. Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus and Giroux. Kelly, G. J., Druker, S., & Chen, C. (1998). Students’ reasoning about electricity: Combining performance assessments with argumentation analysis. International Journal of Science Education, 20(7), 849–871. Kelly, G. J., & Takao, A. (2002). Epistemic levels in argument: An analysis of university oceanography students’ use of evidence in writing. Science Education, 86(3), 314–342. Kim-Goh, M. (1995). Serving Asian American children in school: An ecological perspective. In S. W. Rothstein (Ed.). Class, culture, and race in American schools (pp. 145– 160). Westport, CT: Greenwood. King, A. (1991). Effects of training in strategic questioning on children's problem-solving performance. Journal of Educational Psychology, 83(3), 307–317. King, A. (1992). Comparison of self-questioning, summarizing, and notetaking-review as strategies for learning from lectures. American Educational Research Journal, 29(2), 303–323. Klaczynski, P., & Gordon, D. (1996). Self-serving influences in adolescents’ evaluations of belief-relevant evidence. Journal of Experimental Child Psychology, 62, 317–339. Kramarski, B., & Mevarech, Z. R. (2003). Enhancing mathematical reasoning in the classroom: The effects of cooperative learning and metacognitive training. American Educational Research Journal, 40(1), 281–310. Kuhn, D. (2000). Metacognitive development. Current Developments in Psychological Science, 9, 178–181. Kuhn, D. (2019). Why is reconciling divergent views a challenge? Current Directions in Psychological Science (in press). Kuhn, D. (2019). Critical thinking as discourse. Human Development, 62, 146–164. Kuhn, D., Cheney, R., & Weinstock, M. (2000). The development of epistemological understanding. Cognitive Development, 15(3), 309–328. Kuhn, D., & Crowell, A. (2011). Dialogic argumentation as a vehicle for developing young adolescents’ thinking. Psychological Science, 22, 545–552. Kuhn, D., Hemberger, L., & Khait, V. (2016b). Tracing the development of argumentive writing in a discourse-rich context. Written Communication, 33, 92–121. Kuhn, D., Hemberger, L., & Khait, V. (2016a). Argue with me: Argument as a path to developing students’ thinking and writing (2nd ed.). New York, NY: Routledge. Kuhn, D., & Moore, W. (2015). Argument as core curriculum. Learning: Research and Practice, 1, 66–78. Kuhn, D., & Park, S. (2005). Epistemological understanding and the development of intellectual values. International Journal of Educational Research, 43(3), 111–124. Kuhn, D., Wang, Y., & Li, H. (2011). Why argue? Developing understanding of the purposes and values of argumentive discourse. Discourse Processes, 48(1), 26–49. Kuhn, D., Zillmer, N., Crowell, A., & Zavala, J. (2013). Developing norms of argumentation: Metacognitive, epistemological, and social dimensions of developing argumentive competence. Cognition and Instruction, 31, 456–496. Kuhn, D. (1991). The skills of argument. Cambridge: Cambridge University Press. Lin, T. J., Ha, S. Y., Li, W. T., Chiu, Y. J., Hong, Y. R., & Tsai, C. C. (2019). Effects of collaborative small-group discussions on early adolescents’ social reasoning. Reading & Writing. https://doi.org/10.1007/s11145-019-09946-7. Lord, C., Ross, L., & Lepper, M. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(11), 2098–2109. Macagno, F. (2016). Argument relevance and structure: Assessing and developing students’ uses of evidence. International Journal of Educational Research, 79, 180–194. Manz, E., & Renga, I. (2017). Understanding how teachers guide evidence construction conversations. Science Education, 101(4), 584–615. Mayweg-Paus, E., & Macagno, F. (2016). How dialogic settings influence evidence use in adolescent students. Zeitschrift für Pädagogische Psychologie, 30, 121–132. McNeill, K., & Berland, L. (2016). What is (or should be) scientific evidence use in K-12 classrooms? Journal of Research in Science Teaching, 54(5), 672–689. Mehan, H., & Cazden, C. (2015). The study of classroom discourse: Early history and current developments. In L. Resnick, C. Asterhan, & S. Clarke (Eds.). Socializing intelligence through academic talk and dialogue. Washington DC: American Educational Research Association. Mercer, N., & Littleton, K. (2007). Dialogue and the development of children’s thinking: A sociocultural approach. New York, NY: Routledge. Mercer, N., & Wegerif, R. (1999). Children’ talk and the development of reasoning in the classroom. British Educational Research Journal, 25(1), 95–111. Michaels, S., O’Connor, C., & Resnick, L. (2008). Deliberative discourse idealized and realized: Accountable talk in the classroom and in civic life. Studies in Philosophy of Education, 27(4), 283–297. Miller, P., Wiley, A., Fung, H., & Liang, C. (1997). Personal storytelling as a medium of socialization in Chinese and American families. Child Development, 68(3), 557–568. Murphy, P. K., Greene, J. A., Allen, E. M., Baszczewski, S., Swearingen, A. K., Wei, L., & Butler, A. M. (2018). Fostering high school students’ scientific argumentation and conceptual understanding performance through Quality Talk discussions. Science Education, 102(6), 1239–1264. Nakamura, H. (1964). Ways of thinking of eastern peoples. Honolulu: East-West Center Press. NGSS Lead States (2013). Next Generation Science Standards: For states, by states. Washington, DC: National Academies Press. Nussbaum, E. M. (2008). Using argumentation vee diagrams (AVDs) for promoting argument-counterargument integration in reflective writing. Journal of Educational Psychology, 100(3), 549–565. Nussbaum, E. M., & Asterhan, C. (2016). The psychology of far transfer from classroom argumentation. In F. Paglieri (Ed.). The psychology of argumentation. College

Publications. Nussbaum, E. M., Dove, I. J., Slife, N., Kardash, M., Turgut, R., & Vallett, D. (2018). Using critical questions to evaluate written and oral arguments in an undergraduate general education seminar: A quasi-experimental study. Reading and Writing, 1–22. Nussbaum, E. M., & Edwards, O. V. (2011). Argumentation, critical questions, and integrative stratagems: Enhancing young adolescents’ reasoning about current events. Journal of the Learning Sciences, 20, 433–488. O'Connor, C., & Snow, C. (2018). Classroom discourse: What do we need to know for research and for practice? In M. Schober, D. Rapp, & M. A. Britt (Eds.). Routledge handbook of discourse processes. New York: Routledge. Peng, K., & Nisbett, R. E. (1999). Culture, dialectics, reasoning about contradiction. American Psychologist, 54(9), 741–754. Pressley, M., & Ghatala, E. S. (1990). Self-regulated learning: Monitoring learning from text. Educational Psychologist, 25, 19–33. Resnick, L. B., Asterhan, C. S. C., & Clarke, S. (Eds.). (2015). Socializing intelligence through academic talk and dialogue. Washington, DC: AERA. Resnick, L., Asterhan, C., Clarke, S., & Schantz, F. (2018). Next generation research in dialogic learning. In G. Hall, L. Quinn, & D. Gollnick (Eds.). Wiley handbook of teaching and learning. New York: Wiley. Reznitskaya, A., Anderson, R. C., McNurlen, B., Nguyen-Jahiel, K., Archodidou, A., & Kim, S. (2001). Influence of oral discussion on written argument. Discourse Processes, 32(2/ 3), 155–175. Rinehart, R. W., Duncan, R. G., & Chinn, C. A. (2014). A scaffolding suite to support evidence-based modeling and argumentation. Science Scope, 38(4), 70–77. Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social context. New York, NY: Oxford University Press. Ryu, S., & Sandoval, W. A. (2012). Improvements to elementary children’s epistemic understanding from sustained argumentation. Science Education, 96(3), 488–526. Sandoval, W. A. (2014). Science education’s need for a theory of epistemological development. Science Education, 98(3), 383–387. Sandoval, W. A., & Millwood, K. A. (2005). The quality of students’ use of evidence in written scientific explanations. Cognition and Instruction, 23(1), 23–55. Shi, Y., Matos, F., & Kuhn, D. (2019). Dialog as a bridge to argumentive writing. Journal of Writing Research, 11(1), 107–129. Song, Y., & Ferretti, R. P. (2013). Teaching critical questions about argumentation through the revising process: Effects of strategy instruction on college students’ argumentative essays. Reading and Writing, 26, 67–90. Spencer-Rodgers, J., Williams, M. J., & Peng, K. (2010). Cultural differences in expectations of change and tolerance for contradiction: A decade of empirical research. Personality and Social Psychology Review, 14(3), 296–312. Stanovich, K. E., & West, R. F. (1997). Reasoning independently of prior beliefs and individual differences in actively open-minded thinking. Journal of Educational Psychology, 89(2), 342–357. Stanovich, K. E., West, R. F., & Toplak, M. E. (2013). Myside bias, rational thinking, and intelligence. Current Directions in Psychological Science, 22(4), 259–264. Stigler, J., & Perry, M. (1988). Mathematics learning in Japanese, Chinese, and American classrooms. New Directions for Child Development, 41, 27–54. Sun, J., Anderson, R. C., Perry, M., & Lin, T. J. (2017). Emergent leadership in children's cooperative problem solving groups. Cognition and Instruction, 35(3), 212–235. Toulmin, S. (1958). The uses of argument. Cambridge, England: Cambridge University Press. Triandis, H. C. (1995). Individualism and collectivism. Boulder, CO: Westview Press. Tu, W. (1985). Selfhood and others in Confucian thought. In A. J. Marella, G. DeVos, & F. L. K. Hsu (Eds.). Culture and self: Asian and western perspectives. New York: Tavistock Publications. Tweed, R. G., & Lehman, D. (2002). Learning considered within a cultural context: Confucian and Socratic approaches. American Psychologist, 57(2), 89–99. Villarroel, C., Felton, M., & Garcia-Mila, M. (2016). Arguing against confirmation bias: The effect of argumentative discourse goals on the use of disconfirming evidence in written argument. International Journal of Educational Research, 79, 167–179. Vygotsky, L. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. Walton, D. (1989). Dialogue theory for critical thinking. Argumentation, 3, 169–184. Walton, D., & Zhang, N. (2013). The epistemology of scientific evidence. Artificial Intelligence & Law, 21(2), 173–219. Wegerif, R., Mercer, N., & Dawes, L. (1999). From social interaction to individual reasoning: An empirical investigation of a possible sociocultural model of cognitive development. Learning & Instruction, 9(6), 493–516. Wiley, J., & Voss, J. F. (1999). Constructing arguments from multiple sources: Tasks that promote understanding and not just memory for text. Journal of Educational Psychology, 91(2), 301–311. Wissinger, D. R., & Paz, S. D. (2016). Effects of critical discussion on middle school students’ written historical arguments. Journal of Educational Psychology, 108(1), 43–59. Wolfe, C. R. (2011). Argumentation across the curriculum. Written Communication, 28(2), 193–219. Wolfe, C. R., & Britt, M. A. (2008). The locus of the myside bias in written argumentation. Thinking & Reasoning, 14(1), 1–27. Zhang, D. L., & Chen, Z. Y. (1991). The orientation of Chinese thinking. Beijing: Social Sciences Press of China [in Chinese]. Zillmer, N., & Kuhn, D. (2018). Do similar-ability peers regulate one another in a collaborative discourse activity? Cognitive Development, 45, 68–76. Zohar, A., & David, A. B. (2008). Explicit teaching of meta-strategic knowledge in authentic classroom situations. Metacognition and Learning, 3(1), 59–82.

18