Available online at www.sciencedirect.com
Assessing Writing 14 (2009) 38–61
Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment Michael S. Dempsey ∗ , Lisa M. PytlikZillig, Roger H. Bruning University of Nebraska-Lincoln, Lincoln, NE 68588-0234, USA Available online 13 February 2009
Abstract Writing is a highly valued skill that is often neglected in the classroom; one reason is that teachers often do not receive adequate training in writing assessment and instruction. Teachers, particularly preservice teachers, need practice making detailed assessments of student writing and to build their confidence for assessing student writing, but practical issues of time and resources often constrain the frequency and quality of training they receive. This mixed method study focused on the design and evaluation of an online tool for building preservice teachers’ writing assessment skills and self-efficacy for writing assessment. In the study, teacher education students interacted with actual 4th-graders’ writing samples via a Web-based critical thinking tool. They received scaffolded practice in assessing multiple student papers and justified their assessments using analytic criteria. After each paper, they received feedback that included access to expert assessments and those of their peers, along with both teacher and peer rationales for their ratings. Participants significantly improved in ability to accurately assess student writing using an analytic approach and in knowledge of the writing traits. They also showed significantly greater self-efficacy for assessing student writing and high levels of satisfaction with the Web-based tool and their overall learning experience. © 2009 Published by Elsevier Ltd. Keywords: Writing; Assessment; Teacher candidates; Primary school; Rubric; Feedback
1. Introduction Helping students learn to write well is one of education’s most important achievements. Beyond writing’s role in work success (e.g., National Commission on Writing, 2003), writing is a key factor in the development of problem-solving and critical-thinking skills. Writing enlists a multitude ∗ Corresponding author at: 120 Mabel Lee Hall, Lincoln, NE 68588-0234, USA. Tel.: +1 402 472 4946; fax: +1 402 472 0522. E-mail address:
[email protected] (M.S. Dempsey).
1075-2935/$ – see front matter © 2009 Published by Elsevier Ltd. doi:10.1016/j.asw.2008.12.003
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
39
of cognitive and motivational processes: goal setting; content transformation; recall of topical, audience, and writing genre knowledge; self-regulation; and making multiple judgments of goal attainment (e.g., Bereiter & Scardamalia, 1987; Flower, 1998; Flower & Hayes, 1984; Graham, 2006; Graham & Harris, 1993, 1996; Graham, Harris, & Mason, 2005; Harris & Graham, 1996; Hayes, 2000; Langer & Flihan, 2000; McCrindle & Christensen, 1995; National Writing Project, 2003; Zimmerman & Kitsantas, 1999). Writing also requires developing metacognitive processes that include self-regulation and learning strategies (Bangert-Drowns, Hurley, & Wilkinson, 2004; Ochsner & Fowler, 2004), among these rehearsal, elaboration, organization, and self-monitoring (McCrindle & Christensen, 1995). Requiring students to write, then, has the potential to support students’ deeper learning in a wide range of content areas. No matter what the content and level of student writing, teachers need to be prepared to make judgments about its quality. In this paper we describe the design and evaluation of a multifaceted Web site designed to help preservice teachers develop proficiency in writing assessment. As they interact with and assess the authentic student papers on the site, users employ criteria from an analytic rubric. The Web site includes features designed to scaffold learning about writing assessment and provides multiple opportunities for feedback from peers and experts. In our evaluation of the Web site, we focused on three research questions: (1) Do participants show improvement in knowledge of the writing rubric and in assessment ability after working in the site? (2) Do participants improve their self-efficacy for assessing? (3) Is there evidence that gains on the measures we employed are the result of working with the site and its features? To answer these questions, we used a concurrent nested design (Creswell, 2003) utilizing quantitative and qualitative data collected through the Web site. The study’s design allowed a multidimensional approach to evaluating the effectiveness of the Web site in teaching novice teachers to assess student writing and in building their confidence for writing assessment. Our approach included pre–post comparisons to determine if participants using the Web site increased in knowledge of the rubric, assessment skills, and self-efficacy for assessing writing. We also analyzed how participants used the site’s features by evaluating participants’ patterns of movement through the Web site, examining how these variables might relate to learning outcomes and participant individual differences. Participants were asked to judge the usefulness of various site features for learning to assess writing. Finally, we gathered qualitative data on participants’ experiences with the site by asking them to describe what they thought they had learned, how they learned it, and their feelings about what they had learned. 1.1. Need for teacher training in writing assessment In spite of widespread recognition of writing’s value, writing continues to be the neglected “R” in many schools. One reason for this is that writing well requires coordinating a complex and multifaceted set of skills and learning these skills requires careful instruction and guidance from teachers who are competent and confident in their ability to scaffold students’ progress toward literacy. Simply incorporating more writing assignments into the curriculum is unlikely to produce positive gains in writing ability by itself (Bangert-Drowns et al., 2004; Ochsner & Fowler, 2004). Novice and struggling writers, especially, cannot be expected to acquire and master such complicated processes on their own; systematic teacher instruction and feedback are essential (Applebee & Langer, 1987; National Writing Project, 2003). Teachers’ capabilities for teaching writing and also, we argue, their motivation to provide such instruction and feedback are closely tied to their ability to recognize varying levels of writ-
40
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
ing quality and use these judgments in providing their students with individualized, constructive feedback. Dappen, Isernhagen, and Anderson (2008) offer support for this argument. They found that teachers involved in formulating, analyzing, and implementing a systematic, analytic writing assessment strategy described themselves as more empowered and capable of helping students become better writers. Mastering writing assessment and feedback skills is certainly within any teacher’s reach, but their acquisition requires time and extended practice that teachers—whether preservice or inservice—often do not receive. The result is that many teachers perceive the processes of writing assessment and feedback as vague and beyond their control, affecting their confidence for writing assessment and their motivation to incorporate writing in their classrooms. Typically, in the United States, there is little in the way of systematic experience either in preservice teacher education or teacher professional development programs to prepare teachers in writing instruction (National Commission on Writing, 2003; National Writing Project, 2003). Of course, if this were easy to do teacher education and professional development programs would certainly include these experiences. Instead, as often is the case, practical issues constrain teachers’ professional development. In most teacher education programs, for example, course instructors struggle to balance time and content limitations with the demands of providing practice and feedback for large numbers of students. In the case of writing assessment, offering meaningful practice and feedback requires that instructors provide their students with authentic student writing samples and individual feedback on their assessments, resources not typically available to most instructors. The common result is that preservice teachers interact with student writing largely through textbooks that offer excerpts disembodied from their original text (Tchudi, 1997; Yancey, 1999), making it unlikely that they will be able to provide quality writing instruction and feedback in the future. It is little surprise, then, that their own students’ classroom experiences with writing might be uneven and even negative, resulting in low motivation and efficacy for writing (Bruning & Horn, 2000). The need exists therefore to train prospective teachers to learn to make the kind of informed, detailed judgments about student writing that are central to effective writing instruction. Most preservice teachers, through their own writing and reading that of others, are capable of making gross judgments about writing, placing writing samples into broad categories from good to bad. Analyzing student writing at a molecular level is a new and complex task for them, however. Going beyond global evaluations—breaking down the writing process into comprehensible, manageable components—is needed for preservice teachers to analyze writing and provide feedback that can motivate and encourage individual students (Cooper & Odell, 1977; Diederich, 1974; Spandel, 2005). 1.2. An analytic approach to writing assessment Any reader who has sat with a stack of student papers knows the difficulty of identifying writing quality in concrete terms and the attendant struggle of providing meaningful and consistent feedback. Using a holistic approach, teachers can identify overall quality of writing, but without a framework for identifying what actually makes the writing good or poor it is difficult to communicate to students how they might improve or to show them their progress across revisions and different papers. In the absence of clear criteria and detailed feedback on their writing, students also are likely to interpret assessments of their writing as subjective and arbitrary. They would not be far off the mark. In research dating back nearly a half-century, scoring consistency has been shown to be very low when raters do not use analytic assessment practices.
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
41
In one early study, for example, Diederich (1974), a developer of analytic writing assessment approaches, assessed the evaluation of a set of student writing samples by academic and business professionals. When the set of papers were rated by this group, without any training or rubric to guide them, more than a third of the 300 papers evaluated received every rating possible on a 9-point scale, with none receiving fewer than five different ratings. Diederich and his colleagues found, however, that the evaluators’ written comments yielded certain recurring themes (e.g., ideas, organization, mechanics) that could be used to create a multidimensional, analytic framework for writing assessment. We chose to use an analytic approach in the training we devised because we judged that our novice preservice teachers would benefit from writing assessment training that not only equipped them with explicit dimensions along which to judge writing quality, but also provided them criteria for identifying specific levels of performance. Overall, our goal was to help these preservice teachers move past declarative knowledge to situated skills (Schraw, 2006) by providing them with the combination of repeated practice and rich explanatory feedback from experts shown to be highly beneficial to learners (Moreno, 2004). Moreover, we expected practice with analytic scoring to increase participants’ self-efficacy for writing assessment by building situated knowledge through the rubric and building success in using the rubric through practice and feedback. The specific analytic writing rubric we employed was based on the Six Trait Model (Spandel, 2005). The Six Trait Model’s dimensions or “traits,” which include ideas, organization, voice, word choice, sentence fluency, and conventions (see Fig. 1), are not new and echo earlier analytic writing assessment approaches used over several decades (e.g., Cooper & Odell, 1977; Diederich, 1974; Murray, 1982). We chose this model because it is widely used across the U.S. for assessing writing and has been used successfully by many school districts across the country. It also is the basis for our own state’s writing assessment program, which has been associated with significant gains in writing quality over the course of the assessment’s use (e.g., see Dappen et al., 2008).
Fig. 1. Definitions of the six traits (Spandel, 2005).
42
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
1.3. Web site features and theoretical perspective The Web site that is the focus of this article was designed to take advantage of technology’s affordances to provide practice and feedback in a timely and efficient manner. There can be little question that, in general, learners develop expertise through structured practice (Ericsson, 1988; Eysenck & Keane, 2005). Self-efficacy—the domain-specific confidence that functions as a gatekeeper to performance—also is positively affected by practice and increasingly successful performance (Bandura, 1997). Each of the site features was designed to promote and enhance frequent interaction and practice with authentic student papers, which were collected from the Nebraska statewide writing assessment and represented a wide range of writing abilities of students in grades 4, 8, and 11. 1.4. Overview of learner activities in the Web site and site features Preservice teachers’ interactions with student writing samples in the Web site were guided by software developed at the University of Nebraska’s Center for Instructional Innovation. This software, called ThinkAboutIt, structures repeated cycles of learner interactions with content, in this case actual student papers that had been scanned and made available on the Web site (see Fig. 2). In each cycle, the software requires a decision from learners along with a justification for that decision. They then receive feedback from the system that allows them to compare their decisions and their justifications to those of other learners and to experts. Over repeated cycles, learners receive multiple practice opportunities aimed at proceduralizing and situating their knowledge (Anderson, 1993; Schraw, 2006). In the Web site we evaluated, each ThinkAboutIt cycle consists of two phases: Phase I, choice + explanation (see Fig. 3), is supported by theory and research relating self-explanations to the development of integrated mental representations (e.g., Chi, De Leeuw, Chiu, & LaVancher, 1994; Chi, 2000). For the current Web site, the key activities in Phase I were (a) viewing an actual student writing sample, (b) rating the writing using a rubric based on the Six Trait Model, and (c)
Fig. 2. Preservice teachers have access to an image of the student’s original paper.
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
43
Fig. 3. The ThinkAboutIt cycle in the Writing Assessment Web site.
providing a written justification or explanation for the rating using rubric-related terminology (see Fig. 4). During this phase, learners had access to the original student paper, the rubric for rating the paper on one of the six traits, and a resources folder containing additional information about writing and assessment of writing. They also could access a non-interactive Coach (see Fig. 5), a person who scaffolded learners’ interactions with the papers during Phase I by directing their attention to key sections of the analytic rubric and to relevant sections of the 4th graders’ writing. Scaffolding the participants’ justifications with the rubric, as the Coach feature does, is especially important because many learners, when faced with a problem-solving scenario, make choices but show deficient spontaneous self-explanation (Renkl, 1997). Prompts that elicit principle-based
Fig. 4. This screenshot shows a transcript of the paper, rubric, and tools preservice teachers refer to as they learn to assess student writing.
44
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
Fig. 5. The Coach connects the rubric to the student’s paper, prompting the preservice teachers to think about the paper in terms of the rubric.
self-explanations, however, have been shown to be an effective instructional strategy for improving self-explanations (Atkinson, Renkl, & Merrill, 2003; Renkl, Stark, Gruber, & Mandl, 1998). Our goal here in asking for justifications and providing a Coach to scaffold use of the rubric in those justifications was to foster deeper learning about analytic scoring of student writing by linking participant judgments systematically to the reasons for the judgments (Aleven & Koedinger, 2002; Atkinson et al., 2003). Phase II of the ThinkAboutIt cycle consists of feedback + interaction (see Fig. 3) and contains two related but distinct types of feedback: (a) peer feedback (in graphical and interactive forms) and (b) expert feedback. In general, feedback is important because it motivates and reinforces learners by providing information about the accuracy of their rating and self-explanation and the opportunity to correct inaccuracies or misconceptions (Anderson, Kulhavy, & Andre, 1971). Feedback also produces superior competence and self-efficacy, resulting in greater persistence (e.g., McCarthy, Webb, & Hancock, 1995; Schunk & Rice, 1993). Peer feedback included a bar graph (see Fig. 6) that depicted the rating levels assigned to the current paper by all other participants. Participants could also view and comment upon other participants’ ratings justifications, thus creating an interactive social-constructivist learning environment. Our use of peer feedback and interaction was based on cognitive conflict and social-cultural theory, with research suggesting that group discussions can enhance motivation and engagement, conceptual understanding, critical thinking, metacognition, and social construction of complex knowledge structures (Gunawardena, Lowe, & Anderson, 1997; Newman, Johnson, Webb, & Cochrane, 1997). A non-interactive Expert was also available in Phase II. The Expert reported a judgment (score) of the writing sample, which was based on practicing 4th grade teachers’ actual assessments of the writing. The Expert also provided a “thinkaloud” interpretation of the 4th grade writing as it related to the rubric, thus creating a form of cognitive apprenticeship (e.g., see Collins, Brown, & Newman, 1989) to help participants develop a richer conceptual model for rating student writing (e.g., Pedersen & Liu, 2002). In summary, the Web site included several features designed to support participants’ cognitive engagement. Four of these (availability of an original student paper, rubric, resources folder, and coaching) were accessible during Phase I to help participants decide on a rating for the papers they read and justify their ratings. The second set of four features (graphical peer feedback, verbal peer
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
45
Fig. 6. Preservice teachers receive graphical feedback in Phase II about how their peers rated the same paper on the trait of Ideas and Content. The teachers can now access peer justifications and expert analysis of the paper.
feedback, peer interaction, and expert feedback) could be accessed in Phase II once participants had submitted their rating choice and justification for the paper and trait being analyzed. These features provided participants the opportunity to reflect on and evaluate their answers in light of peer and expert feedback. 2. Methods 2.1. Participants Participants in the study were preservice teachers at a large Midwestern U.S. university who were recruited from two multi-section undergraduate teacher education courses: an intermediatelevel educational psychology course and an upper-level literacy methods course. The learning context thus was different for the two groups. In the literacy group, participants had previously received specific instruction in writing assessment strategies and processes, with some exposure to the Six Trait Model. Work in the site was understood to support this instruction. For the educational psychology participants, who had received only general instruction in assessment strategies and principles, the learning task was directed more at application of these strategies and principles to a soon-to-be-encountered form of educational assessment, assessment of student writing. While both courses attended to assessment, there was another reason for us to include both the educational psychology course and the literacy course in this study. The educational psychology
46
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
course typically enrolls students who are early in their program, and thus have not had a great deal of exposure to assessment strategies related to writing. The literacy students, in contrast, were most often seniors who had formed a background and context for writing assessment. This meant that the literacy students were more advanced in their understanding of writing assessment. By including groups at varying levels of expertise, we hoped to determine if our site benefited novices and more expert students alike. The current evaluation focused on 109 participants who assessed the 4th-grade writing samples and had complete assessment data. The majority of these (58%) were enrolled in the educational psychology course, with 42% enrolled in the literacy methods course. Almost all (97%) were white and most (82%) were females who were either juniors (39.4%) or seniors (43.1%). All of the 46 literacy methods participants identified themselves as future elementary school teachers; most intended to teach in grades K-4 (73%) and saw themselves responsible for teaching a variety of content areas. Most of the educational psychology students (89%) likewise intended to teach in elementary schools and saw their future roles extending across content areas. None of the participants identified themselves as future middle school teachers; seven of the educational psychology students who evaluated the 4th-grade samples planned to be high school teachers. 2.1.1. Participant activities in the site No prior training was provided to participants in using the Web site, though an online tutorial was available if participants wished to use it. The Web site was made available to participants outside of class for a 2-week period during the semester. The site could be accessed at any time; participants could enter, leave, or reenter the site and move between traits and grade levels as they chose. The majority of participants (80%), however, completed their activities in the site in one session. Activity in the site was structured in four stages. First, participants chose the grade level of student writing they wished to study. Second, they completed an associated set of online pretest measures, which included a demographic survey, efficacy scales for writing and for assessing writing, and a two-part pre-test assessing (1) knowledge of general features of the Six Trait Model and rubric and (2) ability to use the rubric to assess student writing samples at the grade level they had chosen. Participants next entered the major activity area of the Web site where they were prompted to choose one of the six traits to begin their rating activity. Participants selected one of the eight papers at the grade level chosen and then read, and rated that paper using the tools described earlier. Participants had two options for moving through the papers and traits: they could rate and receive feedback on a single paper, moving through the traits one-by-one, or they could rate different papers on the same trait before moving on to the next trait. Finally, when they had completed sufficient work in the site (i.e., when learners were satisfied that they were ready for the post-assessment), or met instructor-specific requirements (in one literacy section (N = 24), the instructor required students to rate a minimum of two papers on all six traits), participants completed an evaluation of the site’s features, a post-treatment assessment efficacy scale, and took the post-test of rubric knowledge and rating ability. 2.2. Quantitative procedures and measures 2.2.1. Study design Primary and secondary quantitative measures were collected. The primary measures included knowledge, rating ability, and self-efficacy for writing assessment. These measures provided data most relevant to our research questions. Our secondary measures included tracking variables
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
47
generated by the site’s software, such as time in the site and number of times various tools and features were used by participants in the site. These measures were used to document how participants interacted with the site, offering data on the extent of use of the site’s tools during the learning process. Our quantitative research design included pre–post and randomized between-group components. Performance (knowledge and rating abilities) and efficacy measures were administered pre- and post-training. The knowledge and efficacy post-tests were identical to the pretest; however, the post-test of rating ability included both repeated (identical at pretest and posttest) and new (administered on the posttest only) items. Specifically, we constructed two versions of the rating pre-test (hereafter referred to as item set A or set B). As they entered the site, participants were randomly assigned either set A or B of the rating questions, and at post-test all participants received all 12 (set A and B) of the rating abilities questions. This design had several advantages. By having a subset of identical rating items administered before and after work in the Web site, we were able to directly assess change in the ratings of those items and rule out differences potentially due to use of different item sets. Because pre–post items can be highly susceptible to item sensitization or other repeated measures effects, especially in a short-term intervention such as that used in this study, including similar but new post-items resulted in a post-measure that was less influenced by item sensitization effects. 2.2.2. Measures of knowledge and rating ability (rating error) The pre–post knowledge and rating ability measures provided data on the question of whether participants had learned about writing assessment components and processes over the time that they interacted with the site. Four pre–post knowledge items sampled knowledge of the Six Trait Writing Model and its function in rating writing samples. These were constructed as multiple-choice questions asking participants to identify the trait being portrayed in a detailed description (e.g., “This trait is concerned with the rhythm and cadence of what is written. This trait looks at the flow a writer creates with words.”). Each knowledge question was scored as correct or incorrect; overall scores on this section thus ranged from 0 to 4. Twelve rating abilities items (divided into two 6-item sets, A and B, as mentioned previously) were designed to gauge learners’ abilities to match experts’ ratings. The rating abilities items presented participants with an excerpt of actual children’s writing three to four sentences in length, and asked them to rate one of the six traits, using the same four-point scale and rubricbased criteria that had been employed during training in the site. One item within each of the 6-item sets focused on each of the six traits, so that ratings of all six traits were represented within each item set. The excerpts were selected to represent different levels of writing ability, were taken from papers not available during training, and were matched to the grade level at which participants had chosen to work. To minimize the intrusiveness of the assessment as much as possible in the naturalistic context of the classes and in contrast to training, participants were not required to justify their ratings during the pre- or post-test. The rating questions were scored as absolute distances from the correct rating score, reflecting the extent of difference from the expert rating. These distance scores could range from 0 to 3. If, for example, two participants rated a writing sample as 2 and 4, respectively, and the Expert’s score were 3, each would receive an absolute distance score of 1. The correct rating was based on ratings (from a low of 1 to a high of 4) given these papers by trained scorers during the Nebraska statewide assessment initiative, who had been selected and formally trained and certified to make these assessments.
48
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
2.2.3. Measures of self-efficacy The self-efficacy measures afforded us the opportunity to examine the efficacy-related effects of practice in our site and also to triangulate our performance measures (Bruning, 1996). Two dimensions of self-efficacy, self-efficacy for writing and self-efficacy for assessing writing, were measured prior to using the Web site. The measure of self-efficacy for writing was based on a 0–100 (no confidence at all to complete confidence) scale (Bandura, 1997) reflecting participants’ judgments about their overall efficacy for performing a variety of writing skills and writing tasks (e.g., “I can correctly punctuate a onepage passage”) (Pajares, 2003; Shell, Colvin, & Bruning, 1995). The six items were combined into a single writing efficacy scale by summing across items (Cronbach’s α = .82). The writing efficacy scale was not readministered at post-test since we considered writing efficacy to be a relatively stable quality unlikely to change during the time spent in the site. Participants also judged their self-efficacy for assessing writing on six items (Dempsey, PytlikZillig, & Bruning, 2005); these statements tapped their perceived confidence in rating student writing at a level of accuracy that would match that of a highly trained teacher-rater (e.g., “I can match an expert’s rating of the Ideas trait on a student’s writing sample”). Due to a programming error in the software’s database, participant responses to one of the six items, “I can match an expert’s rating of the Conventions trait on a student’s writing sample,” were not captured by the database. This item was dropped from the analysis and the remaining five items averaged to form a single scale, which still was highly reliable (Cronbach’s α = .98). Because we expected writing assessment efficacy to change during interaction with the site, this scale was re-administered at post-test. 2.2.4. Tracking variables Our software database generated data in two categories: (a) feature use data, and (b) site activity data. The feature use variables we chose to examine in the present research primarily included counts of the number of times participants clicked on learning features the site provided. These features included the Coach, Expert, peer interaction option, rubric, resources, image of the original student paper, and a bar graph showing percentages of participants rating each sample at a given level on the rubric. These features were available within every ThinkAboutIt cycle; thus each time a participant wrote and submitted a judgment on a given paper and particular trait, any of these features could be accessed. Of these, however, only the graph appeared automatically; it was displayed along with the learner’s own rating and rationale each time the learner submitted a rating. Site activity variables were computed from the tracked login and logout dates and times and from time spent on individual pages. These time-based variables provided mechanisms for measuring overall activity in the site (e.g., total time spent in the site, amount of time spent using certain features). The database also automatically tracked when participants rated a paper, submitted a justification for the rating, and received feedback. Collectively, a set of rating + feedback events (i.e., completion of Phases I and II) comprised a complete cycle of activity within the Web site. Thus, we also were able to calculate each participant’s total number of rating cycles as another measure of activity in the site. 2.2.5. Site-related evaluation measures Prior to taking the post-test, participants were prompted to complete several evaluation measures tapping their reactions to various dimensions of the site, such as the clarity and helpfulness of Coach, Expert, and other features. These dimensions were rated on a 5-point Likert-type scale
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
49
ranging from 1 (Strongly Disagree) to 5 (Strongly Agree). Two sets of four items queried the utility of the Coach and the Expert. Each set of responses was averaged into a separate scale (α = .82 and .83, respectively). Single items were used to rate other features of the site, including the rubric, the images of the original student paper, peer rating graph, and peer interaction. In a final set of four Likert-type items, participants rated the effect the site had on their writing instruction-related learning and motivation (e.g., the site “deepened my understanding of writing,” “increased my motivation to emphasize writing in my own teaching,” α = .84). 2.3. Qualitative procedures and measures Our evaluation utilized a concurrent nested mixed method design (Creswell, 2003; Creswell & Plano Clark, 2006; Morgan, 1998; Tashakkori & Teddlie, 1998), in which quantitative and qualitative data were gathered in the same data collection phase. In our concurrent design, we emphasized the quantitative data, but we also used the nested qualitative data to provide a perspective on our Web site evaluation that quantitative data by itself would not have revealed. The qualitative data allowed us to capture participants’ reactions to the site as soon as they had completed their work in it. As participants were leaving the site, after completing all learning and assessment measures and just before completing the posttest, we asked them to respond to the following, open-ended query “. . . as honestly and with as much detail as you can.” Imagine that evaluators of this unit have asked you to write a letter describing your experiences in it. What would you say about what you learned, how you learned it, and how you feel about what you learned? Our goal for this query was to broaden our understanding of participants’ utilization of the Web site and our judgments about any learning and motivational outcomes of using the site. All but one of the 109 participants responded to this question: the remaining 108 responses ranged in length from 1 to 11 sentences (M = 3.69, mode = 3). For our analysis of the written reactions, the unit of analysis was the entire response, which was coded as containing or not containing content related to one of six themes. These themes, which had been previously identified in an earlier study of site use (see Dempsey et al., 2005) included: (a) positive learning outcomes experienced by the participants (i.e., participants specifically identified something they learned from the site, such as the Six Trait Model), (b) positive comments about the use of the tools (i.e., participants identified using a specific tool and that the tool helped them learn in the site), (c) personal benefits experienced by the participants (i.e., participants specifically mentioned a way in which the site helped them personally, such as building their motivation to use writing in their future teaching), (d) overt expressions of positive affect toward the site, (e) constructive criticism such as comments about site limitations and potential improvements, and (f) overt expressions of negative affect toward the site. The qualitative analysis comprised three phases. In the first phase, two of the authors independently analyzed a subset of 20 participant comments and then met to discuss the results. This provided an early check of interrater reliability and afforded the opportunity to evaluate the validity of the themes developed in the earlier study. In the second phase, the two authors worked independently to code the remaining participant comments, after which they met again to compare results. In all there were 654 opportunities for disagreement (109 participant comments × 6 themes). Our results revealed only 37 disagreements between the authors (94% agreement), which were resolved through discussion and consensus. Finally, the data were transformed by quantifying the qualitative data so that we could compare the quantitative and qualitative data sets
50
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
(Creswell, 2003; Creswell & Plano Clark, 2006). Frequencies (i.e., the number of students whose comments fit a theme) were calculated on all the themes developed during the qualitative analysis, and number of sentences per comment was recorded. 3. Results Results from the study are reported in four major sections, the first three focusing on quantitative data and the fourth on qualitative data. Prior to conducting our analyses, all variables were screened for potential outliers and invalid records, which occasionally arise in computer-generated data (e.g., unusually long time stamps between activities such as occur when someone steps away from their computer for a period of time). 3.1. Tracking data Examination of the tracking data provides a description of the type of treatment participants received and, thus, information concerning the activities that may have contributed to any pre–post changes. Table 1 describes how each group of participants interacted with the site. Both groups Table 1 Mean total time (min) spent in the site, total cycles completed, and utilization of site features by educational psychology and literacy participants. Variable
Total time (min) Cycles completed Coach Expert Peers
Educational psychology participants
Literacy participants
M
SD
M
SD
41.67 8.78* 4.00 2.81 .46
46.54 9.68 7.88 5.26 1.22
49.65 12.50* 2.48 1.96 .20
21.13 7.54 3.17 4.16 .54
Note. Total time and Cycles completed were tracked automatically by the system database. Utilization of Coach, Expert, and peers features were counted once each time its link was clicked by a participant. Usage of the rubric and graph were not tracked because these features were available automatically. N’s = 63 and 46 for the educational psychology and literacy class participants, respectively, except for the total time variable which had a small amount of missing data resulting in N = 43 Literacy students. * p < .05 difference between courses.
spent more than 40 min in the site on average, with educational psychology students spending about 8 min less in the site than literacy students. While the educational psychology students showed more frequent use of the features on average, neither group utilized the peers feature to any extent. The only variable upon which the two groups of students differed significantly was the number of cycles completed (t(107) = 2.17, p < .05, partial η2 = .04). The literacy students completed an average of 12.5 cycles (recall that the instructor in one section of the literacy course required students (N = 24) to complete a minimum of 12 cycles), while the educational psychology students completed an average of 8.8 cycles. 3.2. Correlations Table 2 presents the correlations among the various measures for the overall group. Examination of the correlations reveals that pretest knowledge of the Six Trait Model was significantly but
1. Knowledge (pre) 2. Knowledge (post) 3. Rating distance (pre) 4. Rating distance (post) 5. Writing self-efficacy 6. Rating self-efficacy (pre) 7. Rating self-efficacy (post) 8. Frequency coach use 9. Frequency expert use 10. Frequency peers use 11. Total time in site 12. Total practice cycles 13. Learning benefits
2
3
.21*
−.13 .06
Note. N = 109 except for total time in site (N = 106). * p < .05.
4 .05 −.09 .00
5 .18 −.01 .03 −.01
6
7
.32**
.33**
−.12 −.09 .04 .47**
−.07 −.10 .04 .37** .61**
8
9
10
11
12
13
−.12 .10 −.08 .23* −.11 −.26** −.19*
.01 .02 −.03 −.04 .02 .01 .03 .38**
.05 .05 −.04 .05 −.04 .02 .07 .32** .31**
.01 .12 −.22* .03 −.10 .04 .07 .41** .52** .14
.09 .28** −.25* −.03 −.05 −.04 −.01 .54** .38** .07 .70**
.01 .03 −.09 .05 −.16 .01 .21* .11 .03 .08 .06 .08
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
Table 2 Correlations for combined educational psychology and literacy class participants among measures of knowledge, rating ability, self-efficacy, site feature use, cycles completed, and participant judgments of site learning benefits.
51
52
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
only moderately related to posttest knowledge, as might be expected given that the terms were new to most students, but not terribly difficult to conceptually understand and learn. Pre- and post-rating distances were unrelated, which was somewhat surprising. This finding might be interpreted as being consistent with the site design, which allows all motivated participants to reach equally desirable levels of proficiency regardless of prior rating skill, or as reflecting issues of internal reliability (Cronbach α estimates for the pre-test and post-test ratings were all ≤.51). Pre-test knowledge was positively related to both pre-test and post-test efficacy for assessing student writing. Meanwhile, post-test knowledge was related only to the number of practice cycles completed. Interestingly, pre-test rating distance (note that higher numbers indicate less accurate ratings) showed a significant negative correlation with amount of time spent in the site and the number of rating cycles completed, indicating that on average those with higher initial skill levels (lower distances) tended to spend more time in the site and complete more cycles. Post-test rating distance, however, was positively related to frequency of coach use, indicating that those higher in post-test rating ability had used the coach less. Knowledge of the Six Trait Model was not significantly related to rating skill either prior or subsequent to participants’ experience in the site. The participants’ efficacies for writing and for assessing student writing were positively related as expected, while assessment efficacies both before and after using the site were negatively related to use of the site’s Coach feature. In general, participants who felt less efficacious more often sought assistance from the coach. Post-assessment efficacy also was positively related to ratings of the site’s learning benefits. Finally, the overall site utilization variables (e.g., number of cycles completed and time in site) and rated use of various site features were positively related (see especially correlations among utilization of the Coach, Expert, and peer features), perhaps indicating that a more general “engagement” factor underlies these variables. 3.3. Pre-test and post-test comparisons Table 3 contains data showing changes from pre-test to post-test on the primary variables, separated by class (educational psychology vs. literacy classes). Preliminary analyses indicated that students in the two courses primarily differed in pre- and post-knowledge of the Six Trait Model and in pre- and post-practice efficacy for assessing the traits (with the literacy students scoring higher than the educational psychology students, as might be expected). As shown in Table 3, all of the pre–post comparisons were statistically significant for both sets of participants, providing evidence of beneficial changes in knowledge, rating abilities, and rating efficacy, and thus site effectiveness across two different samples of students. Similar effects were found for both repeated and non-repeated rating abilities items. Recall that in our research design one-half the sample was randomly assigned set A of six pretest items, and the other half were assigned a different set B of six items; but all participants completed both sets A and B after the site training. 3.4. Site ratings data Table 4 presents the means and standard deviations of site feature ratings by participants in each course. The ratings were generally positive, with most average ratings above 3 on the 1–5 scale. The educational psychology students rated the peers feature lower (2.46); both groups rated the rubric above 4. The highest average ratings were given to the rubric and the educational benefits of the site, and the lowest average ratings were given to the peers feature, which was also least
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
53
Table 3 Mean participant knowledge, rating distance, and efficacy for assessing student writing at pretest and posttest for participants from educational psychology and literacy classes. Measures
Pretest
Posttest
t
eta2
Mean
(SD)
Mean
SD
Educational psychology Knowledge of six traits Rating distance (repeated stimuli) Rating distance (item set A)b Rating distance (item set B)b Efficacy for assessing writing
2.95a .81 .84 .78 62.33a
(1.08) .30 .32 .29 22.97
3.52a .61 .58 .51 78.44c
.80 .22 .23 .25 22.00
3.55** 4.69** 3.70** 3.95** 6.37**
.17 .26 .18 .20 .40
Literacy Knowledge of six traits Rating distance (repeated stimuli) Rating distance (item set A)b Rating distance (item set B)b Efficacy for assessing writing
3.67a .76 .77 .75 78.25a
.63 .25 .27 .23 14.27
3.87a .56 .55 .55 86.91c
.40 .22 .26 .21 8.52
2.03* 4.45** 2.86** 3.16** 4.87**
.08 .31 .15 .18 .35
Note. N = 63 and 46 for the educational psychology and literacy students, respectively. Knowledge scores are reported as the number of correct responses out of four questions. Rating distance is the mean absolute difference between participant ratings and the expert rating of the writing sample. Self-efficacy for assessing student writing was rated on a 100-point scale ranging from 0 (no confidence at all) to 100 (complete confidence). a p < .01 between group (literacy vs. educational psychology course participants) differences. b These pre- and post-site mean rating distances used “first-exposure” data for each item set (A and B). As described in Section 3, comparisons were conducted between groups rather than within. c p < .05 between group (literacy vs. educational psychology course participants) differences. * p < .05 pre–post (within or between group) differences. ** p < .01 pre–post (within or between group) differences.
used. Ratings of the features between the two groups tended to be similar, with the exception that the literacy students rated the peers feature (t(107) = 2.08, p < .05, partial η2 = .04) and the educational benefits of the site (t(107) = 2.08, p < .05, partial η2 = .07) significantly higher than did the educational psychology students. Table 4 Ratings of site features and site by educational psychology and literacy participants. Variable
Coach Expert Peers Graphical feedback Rubric Educational benefits of site
Educational psychology participants
Literacy participants
M
SD
M
SD
3.09 3.21 2.46* 3.21 4.35 3.51**
.56 .64 1.79 1.54 .95 .64
3.21 3.26 3.17* 3.63 4.67 3.83**
.63 .71 1.74 1.57 .75 .48
Note. Participants were asked to rate statements about site features (e.g., “The Graphical feedback helped me learn in the site”) on a Likert-type scale ranging from 1 (Strongly Disagree) to 5 (Strongly Agree). N’s = 63 and 46 for the educational psychology and literacy class participants, respectively. * p < .05 differences between course groups. ** p < .01 differences between course groups.
54
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
3.5. Qualitative results Table 5 reports the frequencies of participants’ comments coded into one of the six themes. A total of 77 (71%) of the 109 participants identified specific learning outcomes they experienced as a result of working through the site (e.g., “I learned to understand the meaning of the six traits”). Many (N = 71) participants (65%) also made specific reference to a feature of the site as being useful in their learning to assess student writing (e.g., “I feel the expert’s advice was very helpful when trying to decide which category the students writing belonged”). A total of 49 participants (45%) stated that they had gained personally from using the site (e.g., “I feel more confident about my ability to use the six traits”), and 44 participants (40%) were overtly positive toward the site (e.g., “I believe it is a good way to begin learning how to look at a student’s paper in a positive and constructive manner”). Finally, 26 (24%) of the participants offered constructive or critical comments, while only four (4%) were negative toward parts or all of the Web site (e.g., “I would honestly say that I did not learn much at all from this Web site”). Table 5 Themes for participant comments shown by frequency and percentage. Theme
Examples
Frequency
Percent
Learning outcomes
“I learned about the 6 Trait Model,” “I learned about writing assessment.” “I used the rubric. . .,” “the expert helped me. . .,” “I liked rating real student papers.” “This will help me as a future teacher,” “it gave me confidence. . .” “It was a good experience,” “This practice was beneficial.” “I felt I could have learned more if the writing traits had been better explained,” “The coaches were overly helpful.” “This was a huge waste of time and I absolutely hated completing it.”
77
71
71
65
49
45
44
40
26
24
4
4
Tool use Personal benefits Positive affect Constructive criticisms
Negative affect
N = 109. Note: Participants responded after working in the site and before taking the post-test to a question asking them to describe their experiences in the site. Two of the authors analyzed participant responses and identified 348 meaning units, which were then clustered into 6 themes. Frequencies represent the number of participants who had one or more meaning units in a given category.
Under the theme of learning outcomes, the most frequently mentioned theme, two outcomes in particular were mentioned most often: learning about the Six Trait Model and learning to rate different quality levels of student papers. These two categories corresponded with the goal of the Web site to improve knowledge and skill. For tool use, the second most frequently mentioned theme, participants most often referred to the rubric, the Expert, and to the practice afforded by the site. Participant use of the term “expert” was sometimes ambiguous, however. In several instances the context of the comment suggested that the writer was referring both to the Coach and the Expert features (e.g., “learning from all the experts helped me”). Within the theme of personal benefits, participants most often described how the Web site would help them in their future teaching and the confidence the site had given them in assessing student writing. Meanwhile, the positive affect comments were usually more general statements (e.g., mentioning the site as being “useful” or “great”). Constructive suggestions for improving the site included offering a more detailed and clarified rubric, increasing the number of rating categories on the scale for rating papers, provision
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
55
of additional training in how to teach students about the Six Trait Model, and provision of more than one expert viewpoint. Finally, negative comments were quite rare; the few that were received stated that the participant did not learn from the site or that the site did not apply to them (e.g., a few of the participants were special educators who did not view themselves as typical classroom teachers). 4. Discussion We created our Web site to provide scaffolded practice with authentic writing assessment tasks, together with expert and peer feedback on their assessment decisions. We sought information through our research design on three questions: (1) Did participants show improvement in knowledge of the rubric and in rating ability after working in the site? (2) Did participants improve their self-efficacy for assessing writing? and (3) Is there evidence that gains on our measures were the result of working with the site and its features? 4.1. Did participants show improvement in knowledge of the rubric and in rating ability after working in the site? Results of comparisons for participants in both courses showed significant improvement from pretest to posttest in knowledge of the rubric. These knowledge items tapped specific factual knowledge of the rubric criteria for the Six Trait Model, and as such were representative of participants’ declarative knowledge of a particular analytic writing assessment system. As pointed out earlier, preservice teachers often lack understanding and awareness of analytic scoring (National Commission on Writing, 2003; National Writing Project, 2003); our results indicate that participants made significant progress in building conscious recognition of the declarative rules underlying an analytic writing assessment system. Perhaps the most important learning outcome, however, was participants’ significant improvement in ability to match experts’ ratings of writing quality, a task directly related to their future classroom writing assessment activities. Two dimensions of our findings speak to the general validity of this improvement. First, the literacy course participants started and finished with higher scores than those in the educational psychology course. Second, both groups showed improvement. These results suggest that both groups of participants learned to apply the knowledge learned from the rubric when they rated student papers, a critical step toward proceduralizing their declarative knowledge (Anderson, 1996) and gaining facility in analytic scoring. 4.2. Did participants improve their self-efficacy for assessing writing? Both educational psychology and literacy participants showed significant increases in selfefficacy for assessing student papers. Self-efficacy for teaching tasks arguably is very important to future teachers. If they are to embrace analytic scoring methods over the more familiar holistic approaches, they must first believe that they understand analytic scoring and apply it successfully (Bandura, 1997). Self-efficacy for writing assessment may also more generally affect teachers’ decisions to use writing in their classrooms (Poole, Okeafor, & Sloan, 1989; Smylie, 1988). We did see evidence throughout this study that participants judged themselves more likely to incorporate writing into their future teaching as a result of their experience in the Web site. A number of the participants’ comments, for example, explicitly stated this connection. We also asked participants to agree or disagree (on a 5-point Likert-type scale) with two related statements at posttest: the
56
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
site “increased my confidence for using writing in my classroom,” and “increased my motivation to emphasize writing in my classroom.” Educational psychology participants responded moderately positively to these two questions (M = 3.57, SD = .84 and M = 3.52, SD = .82 respectively), with their confidence for using writing in the future positively related to assessment efficacy at posttest (r = .32, p = .05); motivation to emphasize writing was positively related at posttest, but not significantly so. Literacy participants were more positive overall (M = 3.91, SD = .63 and M = 3.76, SD = .71, respectively), and both responses were positively related to self-efficacy at posttest (r = .30, p < .05 for both variables). We see at least two factors as sources of pre- to posttest increases in self-efficacy for assessing writing. The most likely, in our judgment, relates to Bandura’s (1997) concept of enactive mastery as a source of self-efficacy. Our qualitative data clearly indicate that many participants felt that they had acquired writing assessment skills by practicing with student papers in the site. Practice in this instance may not lead to perfection, but likely does build confidence. Experience in using the rubric to rate student writing no doubt helped dispel some of the mystery of writing assessment, with positive effects on self-efficacy. Also, by often scoring at or close to the Expert’s ratings, our novice learners would have experienced frequent feedback on their success, which is closely tied to increases in self-efficacy. 4.3. Is there evidence that gains on our measures resulted from working with the site and its features? Our data provide several pieces of evidence suggesting that the observed gains resulted from work in the site. First, examination of the tracking data showed that most participants completed their work in one sitting and during that time between pre and post-testing the participants were engaged in the site and actively using its features. The fact that they were active in the site rather than occupied in other activities suggests that their site activities were responsible for the observed changes. However, the pattern of correlations reported suggests that future research is needed to understand how and why individuals varying in initial knowledge and skill may have used the site differently (e.g., some using certain features more or less) to maximize site benefits. Second, participants themselves, from both courses, offered evidence that changes were due to their interaction with the site when they reported that the site was helpful overall in learning about writing assessment, describing the utility of the site and its features along multiple dimensions, both quantitative and qualitative. They rated the educational benefits of the site as high (see Table 4), with 71% attributing learning outcomes to having worked in the site (see Table 5). Some of these outcomes were global in nature (e.g., “I have learned a great deal about reading and evaluating 4th grade papers”); others, however, detailed more precisely what they felt they had carried away from the site (e.g., “I would say that I have learned a lot about the six traits writing process. I learned not only about each of the six traits, but also about proper assessment of writing using the traits”). In fact, the Six Trait Model was mentioned most often as a key learning outcome (39% of the respondents), followed by learning to rate different levels of student writing (29% of the respondents). These data generally indicate that participants found the site valuable in building both declarative and procedural knowledge. The participants also rated the site’s major features positively. Sixty-five percent of the participants made explicit, favorable reference to specific features of the Web site, with the rubric mentioned most often (by 28% of the respondents). We also see evidence for its usefulness in several other dimensions of the data, ranging from knowledge and rating accuracy to student comments. One writer commented, “I feel that the rubrics for each section were very helpful
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
57
in giving me, as the test taker, a guideline as to what I should look for and expect to consider ‘good’ writing.” The Expert (mentioned by 23% of the respondents) and graphical feedback, two feedback features designed to support learning procedural knowledge, were rated next highest (see Table 4) by participants in both courses. Commenting on the Expert, one participant stated: “I feel that the expert’s advice was very helpful when trying to decide which category the students’ writing belonged.” Participants also commented on another key feature of the site: practice with student papers (22% of the respondents). One participant stated, “I think that I learned so much more about grading writing by actually getting a hands-on experience.” Finally, there is converging evidence that participants gained in self-efficacy as a result of working in the site. In addition to consistent gains in measured self-efficacy, participants also rated the site as increasing their confidence in assessing student writing. Participants’ comments provided further evidence of the site’s effects on self-efficacy. In identifying the personal benefits they realized as a result of working in the site (see Table 5), nearly one-fifth of participants (18%) specifically mentioned an increase in their confidence (e.g., “I feel more comfortable now in my ability to grade my students’ work objectively and provide constructive criticism that will help them in future writing endeavors”). A final indirect measure of efficacy is provided in the number of participants (23%) who stated that they planned to use their learning in their future teaching. For example, one participant wrote, “[The site] has taught me that there are several different areas that I can teach which would improve writing and help my students later on in life.” Another said, “[What I learned] does make me more excited to help students learn how to effectively write.” 5. Summary, limitations, and implications This mixed method study evaluated a Web site designed to offer teacher candidates knowledge of and practice with writing assessment. Several lines of evidence support the contention that both skills and confidence of novice teachers were enhanced by interaction with the site. These data include generally positive changes from before to after site use, differences between more and less expert groups, and themes in the qualitative data in which participants clearly tie their experiences in the site to perceived changes in the skills, confidence, and goals for writing instruction. Some limitations to the present study should be noted. First, our measures were limited in that they occurred immediately after work in the site, and provided no indication of the participants’ later assessment abilities or use of writing in their own classrooms. Also, although students showed improvement on our current measures of knowledge and rating abilities, some of them can be further developed. For example, the internal consistency of the measures likely can be increased by inclusion of more items. Also, possible differences could be investigated between the rating of brief (2–3 sentences) vs. full writing samples (recall that our current measures used excerpts from student papers, while practice in the site was based on whole papers). Second, although we believe that developing skill and confidence in writing assessment is very important for improved writing instruction, it is by no means the only factor involved. Preservice teachers need also to understand the mechanics of the language they are evaluating, the best literacy instructional practices available, and the functions and uses of writing in society. This suggests that developing expertise and efficacy in writing assessment should occur within a broader context of literacy instruction. While it is important that the educational psychology participants improved in assessment ability and self-efficacy, literacy participants started and finished with higher rating ability and self-efficacy as measured by our variables. It is probable
58
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
that the performance differences between these two groups are the result of precisely the literacy instruction that informed the literacy participants’ interaction with our Web site. Third, it should be noted that our Web site was primarily used as a stand-alone application, with interaction in the site tied to but not fully integrated in the courses in which participants were enrolled. Our sense is that incorporating experiences like those provided in our site into classroom instruction and discussion would benefit participants even more. For example, the ThinkAboutIt technology used here included features permitting participant discussion, features that have been used extensively in other applications of the technology. In the current study, participants seldom used these features. Included as a part of ongoing preservice instruction, and guided by teacher educators, however, such discussions could encourage preservice teachers to closely analyze the expert scores and justifications found in the site, and act as a springboard for them to understand and objectify their own standards for writing assessment. Fourth, while we were able to compare randomly assigned subgroups on the rating skills measure, the current study did not include a separate control group that would have permitted direct comparison of the beliefs and skills of those working in the Web site to a group that did not experience site-related activities. Although including a control group would have been desirable from a design perspective, in developing this evaluation study we did not judge this option to be available to us, given our instructional responsibilities and the site’s previous use in these classes (see Dempsey et al., 2005). Instead, the mixed method approach we chose was designed to generate a variety of sources of evaluative data that could be used to triangulate our judgments about the effectiveness of the intervention (Creswell & Plano Clark, 2006). The result was that the current study focused on providing an account of whether this relatively complex application was working as intended and, if generalized, would have potential relevance to writing instruction. In spite of these limitations, we believe there are several important implications that can be drawn from our findings. First, our results indicate that structured practice (Ericsson, 1988; Eysenck & Keane, 2005) with authentic student papers can produce gains in knowledge, assessment skill, and self-efficacy (Bandura, 1997) for preservice teachers. Participants showed significant gains while typically interacting with relatively few student papers, most likely as a consequence of the detailed analysis these interactions required and the wide freedom afforded to most participants to choose how much time they worked in the site. Second, this study offers indirect support for the use of analytic writing assessment (Cooper & Odell, 1977; Diederich, 1974; Murray, 1982), in this case the Six Trait Model (Spandel, 2005), as a mechanism for developing skill and self-efficacy for evaluating student writing and encouraging future teachers to incorporate writing into their teaching. The analytic rubric was highly valued by participants both in their ratings of the site tools and in their qualitative comments. While previous research has examined the relationship of analytic scoring to student writing improvement (see Hillocks, 1987), to our knowledge there have been no studies that have examined the effect of analytic writing assessment on teachers’ self-efficacy for evaluating writing and their motivation to incorporate writing into their curriculum. We believe there is a strong need for further empirical research in this area. Third, the results show that it is possible, as a matter of practicality, to use a Web-based resource to develop writing assessment skills. Web-based activities, properly designed to provide scaffolding and afford opportunities for practice, decision-making, and feedback (Aleven & Koedinger, 2002; Atkinson et al., 2003; Renkl et al., 1998) can provide an efficient way to distribute difficult-to-collect-and-distribute authentic resources to a large audience of prospective teachers, benefiting both these teachers and their trainers. Furthermore, because practice and feedback can occur outside normal class time, teacher educators can devote class time that would
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
59
be taken for this practice to other important curricular objectives. With a Web-based learning tool, users can choose the time and place they interact with the learning materials. While this study focused on typical writing topics found in 4th grade classrooms, the Web site can also be adapted for other uses. For example, second language rubrics could be incorporated into the design of our Web site to test the viability of producing gains for second language teachers similar to those we found with first language preservice teachers. Fourth, Web-based assessment practice activities for preservice teachers such as those explored in this study have implications for improving student writing (Applebee & Langer, 1987; National Commission on Writing, 2003; National Writing Project, 2003). Although writing is inarguably important for success in and out of the classroom, it typically receives little emphasis in most teacher education curricula. This study provides evidence that structured Web-based practice can increase teacher self-efficacy for assessing writing and motivate preservice teachers to apply what they have learned during practice to their future teaching. By extension—and we believe there is a solid basis in theory and in fact in the comments of these preservice teachers—growth in new teachers’ skill and confidence is likely to increase the probability that these teachers, especially non-language arts teachers who have less writing-related training and experience, will emphasize writing more in their own classrooms. Acknowledgements The authors wish to express our appreciation for the generous support of the Andrew Mellon Foundation for portions of the work reported in the manuscript through a grant to the Center for Instructional Innovation at the University of Nebraska-Lincoln. Additional work was completed with support from Grant No. P116Z010071 from the U.S. Department of Education to the National Center for Information Technology in Education. This article does not necessarily reflect the positions or policies of the Andrew Mellon Foundation, the U.S. Department of Education, or any other agency of the U.S. federal government. We also wish to thank Christy Horn, Co-Director of UNL’s Center for Instructional Innovation, for her strong support throughout this project and to gratefully acknowledge the computer technology design and support services provided for this work by George Krueger of Metalogic, and Jeremy Sydik of the Center for Instructional Innovation. References Aleven, V., & Koedinger, K. (2002). Effective metacognitive strategy: Learning by doing and explaining with a computerbased Cogntive Tutor. Cognitive Science, 26, 147–179. Anderson, J. R. (1993). Problem solving and learning. American Psychologist, 48, 35–44. Anderson, J. R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51, 355–365. Anderson, R. C., Kulhavy, R. W., & Andre, T. (1971). Feedback procedures in programmed instruction. Journal of Educational Psychology, 62, 148–156. Applebee, A., & Langer, J. (1987). How writing shapes thinking: A study of teaching and learning. Urbana, IL: National Council of Teachers of English. Atkinson, R. K., Renkl, A., & Merrill, M. M. (2003). Transitioning from studying examples to solving problems: Effects of self-explanation prompts and fading worked-out steps. Journal of Educational Psychology, 95 (4), 774–783. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W.H. Freeman and Company. Bangert-Drowns, R., Hurley, M., & Wilkinson, B. (2004). The effects of school-based writing-to-learn interventions on academic achievement: A meta-analysis. Review of Educational Research, 74 (1), 29–58. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
60
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
Bruning, R. (1996). Examining the utility of efficacy measures in distance education research. In: C. C. Gibson (Ed.), Distance Education Symposium 3: Learners and Learning (pp. 17–29). University Park, PA: American Center for Distance Education (American Center for the Study of Distance Education Research Monograph). Bruning, R. H., & Horn, C. A. (2000). Developing motivation to write. Educational Psychologist, 35 (1), 25–37. Chi, M. (2000). Self-explaining expository texts: The dual processes of generating inferences and repairing mental models. In: R. Glaser (Ed.), Advances in instructional psychology: Educational design and cognitive science (pp. 161–237). Mahwah, NJ: Erlbaum. Chi, M. T. H., De Leeuw, N., Chiu, M., & LaVancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18, 439–477. Collins, A., Brown, J. S., & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the craft of reading, writing, and mathematics. In: L. B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser (pp. 453–494). Hillsdale, NJ: Erlbaum. Cooper, C., & Odell, L. (1977). Evaluating writing. Buffalo, NY: SUNY. Creswell, J. (2003). Research design: Qualitative, quantitative, and mixed methods approaches. Thousand Oaks, CA: Sage. Creswell, J., & Plano Clark, V. (2006). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage. Dappen, L., Isernhagen, J., & Anderson, S. (2008). A statewide writing assessment model: Student proficiency and future implications. Assessing Writing, 13 (1), 45–60. Dempsey, M., PytlikZillig, L., & Bruning, R. (2005). Building writing assessment skills using web-based cognitive support features. In: L. PytlikZillig, M. Bodvarsson, & R. Bruning (Eds.), Technology-based education (pp. 83–106). Greenwich, CT: Information Age Publishing. Diederich, P. (1974). Measuring growth in English. Urbana, IL: National Council of Teachers of English. Ericsson, K. (1988). Analysis of memory performance in terms of memory skill. In: R. J. Sternberg (Ed.), Advances in the psychology of intelligence:. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Eysenck, M., & Keane, M. (2005). Cognitive psychology: A student’s handbook. New York: Psychology Press. Flower, L. (1998). Problem-solving strategies for writing in college and community. Ft. Worth, TX: Harcourt Brace College Publishers. Flower, L. S., & Hayes, J. R. (1984). Images, plans, and prose: The representation of meaning in writing. Written Communication, 1, 120–160. Graham, S. (2006). Strategy instruction and the teaching of writing: A meta-analysis. In: C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (pp. 187–207). New York: Guilford Press. Graham, S., & Harris, K. R. (1993). Self-regulated strategy development: Helping students with learning problems develop as writers. Elementary School Journal, 94 (2), 169–179. Graham, S., & Harris, K. R. (1996). Teaching writing strategies within the context of a whole language class. In: E. McIntyre & M. Pressley (Eds.), Balanced instruction: Strategies and skills in whole language (pp. 155–175). New York, NY: Christopher-Gordon. Graham, S., Harris, K. R., & Mason, L. (2005). Improving the writing performance, knowledge, and motivation of struggling young writers: The effects of Self-Regulated Strategy Development. Contemporary Educational Psychology, 20, 207–241. Gunawardena, C. N., Lowe, C. A., & Anderson, T. (1997). Analysis of a global on-line debate and the development of an interaction analysis model for examining social construction of knowledge in computer conferencing. Journal of Educational Computing Research, 17 (4), 397–431. Harris, K. R., & Graham, S. (1996). Making the writing process work: Strategies for composition and self-regulation. Cambridge, MA: Brookline. Hayes, J. (2000). A new framework for understanding cognition and affect in writing. In: R. Indrisano & J. Squire (Eds.), Perspectives in writing: Research, theory, and practice (pp. 112–139). Newark, DE: International Reading Association. Hillocks, G. (1987). Synthesis of research on teaching writing. Educational Leadership, 44, 71–82. Langer, J. A., & Flihan, S. (2000). Writing and reading relationships: Constructive tasks. In: R. Indrisano & J. Squire (Eds.), Perspectives in writing: Research, theory, and practice (pp. 112–139). Newark, DE: International Reading Association. McCarthy, M., Webb, J., & Hancock, T. (1995). Form of feedback effects on learning and near-transfer tasks by sixth graders. Contemporary Educational Psychology, 20, 140–150. McCrindle, A., & Christensen, C. (1995). The impact of learning journals on metacognitive and cognitive processes and learning performance. Learning and Instruction, 5, 167–185. Moreno, R. (2004). Decreasing cognitive load for novice students: Effects of explanatory versus corrective feedback to discovery-based multimedia. Instructional Science, 32, 99–113.
M.S. Dempsey et al. / Assessing Writing 14 (2009) 38–61
61
Morgan, D. (1998). Practical strategies for combining qualitative and quantitative methods: Applications to health research. Qualitative Health Research, 8 (3), 362–376. Murray, D. (1982). Learning by teaching. Portsmough, NH: Bounton/Cook. National Commission on Writing in America’s Schools and Colleges (2003). The neglected “R”: The need for a writing revolution. Retrieved July 29, 2005, from http://www.writingcommission.org/. National Writing Project (2003). Because writing matters: Improving student writing in our schools. San Francisco: Jossey-Bass. Newman, D. R., Johnson, C., Webb, B., & Cochrane, C. (1997). Evaluating the quality of learning in computer supported co-operative learning. Journal of the American Society for Information Science, 48 (6), 484–495. Ochsner, R., & Fowler, J. (2004). Playing devil’s advocate: Evaluating the literature of the WAC/WID movement. Review of Educational Research, 74 (2), 117–140. Pajares, F. (2003). Self-efficacy beliefs, motivation, and achievement in writing: A review of the literature. Reading and Writing Quarterly, 19 (2), 139–158. Pedersen, S., & Liu, M. (2002). The effects of modeling expert cognitive strategies during problem-based learning. Journal of Educational Computing Research, 26 (4), 353–380. Poole, M., Okeafor, K., & Sloan, E. (1989). Teachers’ interactions, personal efficacy, and change implementation. In: Paper presented at the Annual Meeting of the American Educational Research Association Renkl, A. (1997). Learning from worked-out examples: A study on individual differences. Cognitive Science, 21 (1), 1–29. Renkl, A., Stark, R., Gruber, H., & Mandl, H. (1998). Learning from worked out examples: The effects of example variability and elicited self-explanations. Contemporary Educational Psychology, 23, 90–108. Schraw, G. (2006). Knowledge: Structures and processes. In: P. Alexander & P. Winne (Eds.), Handbook of educational psychology (pp. 245–264). Mahwah, NJ: Lawrence Erlbaum Associates. Schunk, D. H., & Rice, J. M. (1993). Strategy fading and progress feedback: Effects on self-efficacy and comprehension among students. Journal of Special Education, 27, 257–276. Shell, D. F., Colvin, C., & Bruning, R. H. (1995). Self-efficacy, attributions, and outcome expectancy mechanisms in reading and writing achievement: Grade-level and achievement-level differences. Journal of Educational Psychology, 87, 386–398. Smylie, M. (1988). The enhancement function of staff development: Organizational and psychological antecedents to individual teacher change. American Educational Research Journal, 25, 1–30. Spandel, V. (2005). Creating writers through 6-trait writing assessment and instruction. Boston, MA: Pearson Education, Inc. Tashakkori, A., & Teddlie, C. (1998). Mixed-methodology: Combining qualitative and quantitative approaches. Thousand Oaks, CA: Sage. Tchudi, S. (1997). Introduction: Degrees of freedom in assessment, evaluation, and grading. In: S. Tchudi (Ed.), Alternatives to grading student writing (pp. ix–xvii). Urbana, IL: National Council of Teachers of English. Yancey, K. B. (1999). Looking back as we look forward: Historicizing writing assessment. College Composition and Communication, 50 (3), 483–503. Zimmerman, B. J., & Kitsantas, A. (1999). Acquiring writing revision skill: Shifting from process to outcome selfregulatory goals. Journal of Educational Psychology, 91, 241–250. Michael Dempsey is a PhD candidate in Cognition, Learning, and Development at the University of Nebraska-Lincoln. His research interests include writing, cognition, and linguistics. Lisa PytlikZillig is a research professor at the Center for Instructional Innovation at the University of Nebraska-Lincoln. Her research interests include human motivation and affect in a variety of applied contexts. Roger Bruning is Velma Warren Hodder Professor of Educational Psychology at the University of Nebraska-Lincoln where he co-directs the Center for Instructional Innovation. His research interests focus on cognitive and motivational processes in reading and writing and on technology-based support for teacher education.