Available online at www.sciencedirect.com
ScienceDirect Computers and Composition 42 (2016) 1–16
A critical interpretative synthesis: The integration of Automated Writing Evaluation into classroom writing instruction Marie Stevenson University of Sydney, Australia
Abstract Automated Writing Evaluation (AWE) is computer-generated scoring and feedback that is used for both assessment and instructional purposes. Much controversy has surrounded AWE, especially in high-stakes tests such as TOEFL, and much of the discussion has centered around the scoring and feedback capabilities of AWE and the effects of AWE on text quality. Relatively little attention has been directed towards the ways that AWE is used or could be used as an instructional tool in the writing classroom. Through a critical interpretative synthesis of existing research, this study provides an overview of what is currently known about the integration of AWE into classroom writing instruction. The synthesis found that that there are numerous purposes for using AWE stated in existing research, some of which do not accord with objectives commonly associated with AWE; that teachers had varied and creative ways of integrating AWE in their classrooms; and that, although students generally seemed to enjoy using AWE, at the times when the sample studies were conducted, there appeared to be many limitations in the feedback provided by AWE systems. The study discusses these findings in terms of criticisms that have been leveled against AWE and links this discussion to broader considerations of the relationship between literacy, technology and pedagogy. © 2016 Elsevier Inc. All rights reserved. Keywords: automated writing evaluation; AWE; feedback; writing instruction; literacy
1. Introduction Automated Writing Evaluation (henceforth AWE) involves computer-generated scoring and feedback for writing. Numerous commercial and non-commercial AWE systems are available, the central component of which is a scoring engine that generates automated scores based on techniques such as artificial intelligence, natural language processing and latent semantic analysis. Many AWE systems also incorporate written feedback on various aspects of writing. However, most AWE systems model only a relatively small part of the writing construct, being largely concerned with structure (e.g., topic sentences and paragraph transitions); phrasing (e.g., vocabulary and sentence length); and transcribing (e.g., spelling and mechanics (Deane, 2013). AWE was originally developed to generate summative scores for assessment purposes, and is currently being used, in combination in human evaluation, in high-stakes tests such as the Test of English as a Foreign Language (TOEFL) and the Graduate Management Admissions Test (GMAT). However, the use of AWE feedback as an instructional tool in writing classrooms is increasing, especially in school and college classrooms in the United States.
E-mail address:
[email protected] http://dx.doi.org/10.1016/j.compcom.2016.05.001 8755-4615/© 2016 Elsevier Inc. All rights reserved.
2
M. Stevenson / Computers and Composition 42 (2016) 1–16
Considerable controversy has surrounded AWE, particularly its use in high stakes testing situations. This controversy has centered around doubts concerning the accuracy of scoring and feedback capabilities, and fears concerning the effects of writing for a non-human audience. There have also been criticisms concerning the restricted and objectivistic views of writing and assessment that are said by some to underpin AWE scoring and feedback. Vojak, Kline, Cope, McCarthey, & Kalantzis (2011) examined scoring and feedback features in terms whether AWE systems harness the potential of new technologies to promote new literacies. The authors found that AWE systems generally failed to reflect social, contextual and multi-modal aspects of writing. AWE validation research, much of which has been carried out by researchers affiliated in some way with companies that develop and market AWE systems, has focused on the psychometric properties of AWE scoring, with the objective of demonstrating that AWE systems score as reliability and validly as human raters. AWE pedagogical research has predominantly focused on establishing whether AWE scoring and feedback have positive effects on the quality of texts produced by student writers. A critical review of research on the effects of AWE on text quality (Stevenson & Phakiti, 2014) found that there was relatively little attention in the literature as to how AWE was used in the classroom, or to how it could be effectively integrated into classroom instruction. Nonetheless, a small number of studies do examine use of AWE in the classroom, and some of the research that focuses on the effects of AWE on text production does include a modest amount of discussion of classroom use. Through a critical interpretative synthesis of existing research, the current study provides an overview of what is currently known about the integration of AWE into classroom writing instruction. It examines three commercially available web-based AWE systems: Criterion, MY Access! and Summary Street. The study does not aim to describe or compare specific AWE programs. Rather, by examining classroom integration, it seeks to go beyond providing descriptive “gee-whiz explanations of new technologies” (Luke & Luke, 2001: 93). The study discusses the findings in terms of criticisms that have been leveled against AWE. This discussion is linked to broader considerations of the relationship between literacy, technology and pedagogy.
2. The controversy surrounding AWE The controversy surrounding AWE has been widespread. An online petition, “Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment” received thousands of signatures, including Noam Chomsky’s, and was cited in a number of newspapers, including The New York Times. The machine-scoring of writing for assessment purposes has also been opposed by the Conference on College Composition and Communication (CCCC) in Position statement on teaching, learning, and assessing writing in digital environments (2004) and Writing assessment: A position statement (2009). The 2004 statement states that: “Writing-to-a-machine violates the essentially social nature of writing: we write to others for social purposes.” In 2006, Patricia Freitag Ericsson & Richard Haswell edited a book entitled Machine Scoring of Student Essays: Truth and Consequences, which consisted of a series of papers strongly questioning the purported accuracy of computerized essay scoring, warning of the dehumanization of writing caused by writing for a machine, and decrying the use of AWE to replace human raters in testing situations. It is tempting to dismiss some of these criticisms as a kind of neo-Luddism, and indeed this charge has been made by those who point out that the teaching practices these criticisms defend are the same ones that were vehemently attacked with their introduction in the industrial revolution (e.g., Elliot & Klobucar, 2013). Certainly, there is a technophobic tone in some of the criticisms, for example, the introduction of the Freitag Ericsson & Haswell book: “...new technology can sneak in the back door and establish itself while those at the front gates, nominally in charge, are not much noticing. All of a sudden cell phones are disturbing legislative sessions and church services and allowing students to cheat on examinations in new ways. All of a sudden students can pass entrance examination essays in ways never allowed before, with their essays scored by machines running commercial software programs” (p. 1). However, criticisms of AWE have also been voiced by those who champion technology as a dominant shaping force in what are referred to as ‘new literacies’. (See Buckingham (1993) and Lankshear and Knobel (2006) for discussion of new literacies). Much has been made of properties of new literacies such as multi-modality, synchronicity (i.e., real-time on-line communication) and non-linearity (e.g., hypertexts), and the need to incorporate new literacies into
M. Stevenson / Computers and Composition 42 (2016) 1–16
3
literacy instruction via meaningful contexts. Cope and Kalantzis (2007) warn against the use of new technologies to simply reinforce old literacy practices, “New media do not necessarily mean new learning. Old institutions have an enormous capacity to assimilate new forms without fully exploiting their affordances. From the scope of possibility in the new media, education all-too-often selectively does things with them that are not much more than conventional” (p. 75). As mentioned already, Vojak et al. (2011) carried out a study that specifically examined whether AWE systems harness the potential of new technologies to promote new literacies. This study examined the scoring and feedback features of various programs according to how well AWE systems fit with understandings of: “1) writing as a socially situated activity; 2) writing as functionally and formally diverse; and 3) writing as a meaning-making activity that can be conveyed in multiple modalities” (p. 98). The study drew the conclusion that AWE primarily reinforces “old practices” (p. 99)—that is, views of formal correctness in writing that reflect an objectivistic, exam-oriented perspective. The authors found that the AWE systems they examined generally failed to reflect social, contextual and multi-modal aspects of writing. Specifically, the systems offered limited opportunity for pre-writing processes or collaboration; valued formulaic conventions and accuracy of surface level language features over true invention; relied on prompts from a limited range of genres; and displayed little or no incorporation of multi-modal meaning making. The authors concluded that the major concern was not the technology itself, but, rather, the restricted view of writing and assessment that underpins these systems. Seen from this perspective, the controversy surrounding AWE could be typified as what Labbo (2006) describes as “tensions that exist between the push forward of new digital literacies and the pull backward of traditional literacy” (p. 200), in this case the use of technology to promote the development of traditional print literacy. 3. AWE Research AWE scholarship has focused more on scoring than on written feedback, and, as mentioned, specifically on establishing the credentials of AWE scoring engines by outlining their developing capabilities and demonstrating their reliability and validity by comparing them with human raters. Studies have frequently found high correlations between AWE scores and human scores and these results have been taken as providing evidence that AWE scores provide a psychometrically valid measure of students’ writing. See two volumes edited by Shermis and Burstein (2003) Shermis and Burstein (2013) for detailed results and in-depth discussion of the reliability and validity of specific AWE systems. This psychometric focus is still very much evident in the Handbook of Automated Essay Evaluation: Current applications and new directions (Shermis & Burstein, 2013), which provides an overview of the developments in AWE during the past decade and which, though it makes some effort to include educational and writing research perspectives, predominantly focuses on the developments of new AWE systems, the extension of scoring and feedback capabilities of existing systems and the psychometric properties of AWE systems. When it does focus on classroom-based research, as explained, AWE research has been predominantly concerned with whether feedback leads to improvements in the quality of students’ texts (e.g., Shermis, Burstein, & Bliss, 2004; Elliot & Mikulas, 2004a; Elliot & Mikulas, 2004b; Wang & Wang, 2012). A critical review of research on the effects of AWE on text quality (Stevenson & Phakiti, 2014) concluded that there was modest evidence that AWE feedback has a positive effect on the quality of the texts that students produce using AWE, but that, largely due to lack of research, there was as yet little or no evidence that it leads to more generalized, longer term improvements in writing. As mentioned, it was also found that there was relatively little attention in the literature to how AWE was used in the classroom, or to how it could be effectively integrated into classroom instruction. Warschauer and Ware (2006) criticized existing AWE research for being too outcome-oriented and ignoring the importance of learning and teaching processes involved in the use of AWE in the classroom. They suggested that research on AWE should strive to provide a more contextualized understanding of the actual use of AWE. Cotos (2014) also made the point that both supporters and opponents of AWE have largely ignored the “ecology of implementation in target contexts” (p. 59), meaning that the way in which AWE is used by teachers in the classroom is likely to affect how AWE influences students and their writing. The current study will examine the integration of AWE into classroom teaching and learning by carrying out a smallscale critical interpretative synthesis of the existing pedagogical research on Criterion, MY Access! and Summary Street. These programs have been selected, as they are the focus of the bulk of the classroom-oriented research. Brief
4
M. Stevenson / Computers and Composition 42 (2016) 1–16
descriptions of each of these programs are provided in appendix 1. The study identifies three key constructs in the relevant literature relating to the integration of AWE into the writing classroom, purpose, action and use, and uses these to examine the ways in which AWE has been integrated into classroom instruction. 4. Method The study employed a critical interpretative synthesis (CIS) methodology, based on Fleming (2009). Fleming (2009) describes CIS as “a new method of reviewing, developed from meta-ethnography, which integrates systematic review methodology with a qualitative tradition of enquiry” (p. 201). CIS originated in the health sciences as a means of going beyond the traditional synthesis of quantitative results from experimental studies that were intended to establish the effectiveness of a particular intervention. CIS enables findings from both quantitative and qualitative studies to be synthesized together, and allows the synthesis of many different types of evidence, not just evidence regarding effectiveness. CIS takes an inductive approach, allowing key constructs to emerge from the data, rather than beginning with a specific research question. It also enables account to be taken of the contexts in which individual studies are implemented. It does not rely on tabulating the outcomes of studies in terms of their effectiveness. Instead, over-arching synthetic constructs and arguments are developed. In the current study, the sample comprises both quantitative and qualitative research, so an approach that allows the synthesis of both is needed. Moreover, as the current study is not concerned with establishing the effectiveness of AWE in the classroom, but rather with examining the ways in which AWE can be integrated into the classroom and how this is done in specific contexts of use, CIS is a suitable approach. The stages in CIS are: • • • •
Identification of studies Data abstraction Development of synthetic constructs Development of synthetic arguments The description of the method below will follow these stages.
4.1. Identification of studies A comprehensive and systematic literature search was conducted to identify relevant primary sources. Both published research (i.e., journal articles, book chapters and reports) and unpublished research (i.e., theses and conference papers) were identified. Search engines, data bases, websites, reference lists of existing studies, bibliographies and systematic searches through issues of relevant journals from 1990 to 2013 were used to identify relevant literature. Key journals in the areas of language learning technology, writing, and education were selected. See appendix 2 for more detailed information about the literature search. This search process yielded a total of 29 studies that involved primary research on AWE feedback for formative evaluation of texts in the writing classroom that was generated by one of the three programs. Some of the studies specifically addressed the manner in which AWE was used in the classroom, and some focused on other aspects of AWE feedback, such as its effects on written production. Any study that reported, interpreted or discussed findings relevant to the classroom integration of AWE in either the findings or the discussion section was eligible for inclusion, even if classroom integration was not the main focus. Initial reading of the studies led to the exclusion of 8 studies, as these did not contain any information about the classroom integration of AWE. Thus, 21 studies remained for inclusion in the synthesis. 4.2. Data abstraction Key information was extracted from each of the studies to provide a context within which data about classroom use in these studies could be interpreted. Table 1 shows the authors, program, type of publication, educational context, research focus, language background of participants and author affiliations. The table shows that the authors come from diverse disciplinary backgrounds: computer science, education, language, psychology and social work. Computer science is
Table 1 Key information from sample studies. Program
Publication
Context
Focus
Language background
Author Affiliation
Steinhart (2001) Wade-Stein and Kintsch (2004) Franzke, Kintsch, Caccamise, & Johnson (2005) Elliot and Mikulas (2004a) Elliot and Mikulas (2004b) Dikli (2007) Lai (2010)
Summary Street Summary Street
thesis ISI listed
primary primary
product product & process
unspecified unspecified
LSA research group University of Colorado LSA research group, University of Colorado
Summary Street
ISI listed
secondary
product & perceptions
mixed
Pearson Knowledge Technology
MY Access!
unpublished paper
secondary
product & perceptions
unspecified
Vantage Learning
MY Access! MY Access!
thesis ISI listed
English centre college
process & perceptions perceptions
ESL EFL
Chen and Cheng (2008)
MY Access!
non ISI listed
college
perceptions
EFL
Grimes and Warschauer (2010) Grimes (2008)
MY Access!
non ISI listed
perceptions
mixed
MY Access!
thesis
Information and Computer Science, University of California
MY Access!
unspecified
Department of English, Universiti Putra Malaysia
Warschauer and Grimes (2008) Rock (2007) Kellogg et al. (2010)
Criterion & MY Access! Criterion Criterion
unpublished paper non ISI listed
product process & perceptions perceptions
unspecified
Hoon (2006)
primary and secondary primary and secondary college
Faculty of Education, University of Florida Department of Language and Literature, National University of Kaohsiung, Taiwan Department of English, National University of Kaohsiung, Taiwan Information and Computer Science, University of California
perceptions
mixed
Information and Computer Science, University of California
product product
mixed unspecified
Frost (2008)
product process & perceptions product & perceptions product & perceptions
mixed
ETmS Department of Psychology, Saint Louis University, St Louis; ETS College of Education, University of Nevada
EFL/ESL
report ISI listed
primary and secondary secondary college
Criterion
thesis
secondary
Schroeder et al. (2008) Matsumoto and Akahori (2008)
Criterion Criterion
non ISI listed unpublished paper
college college
Choi (2010)
Criterion
thesis
college
El Ebyary and Windeatt (2010) Fang (2010)
Criterion
non ISI listed
college
MY Access!
ISI listed
college
product process & perceptions product process & perceptions perceptions
Tsou (2008)
MY Access!
non ISI listed
college
product & perceptions
EFL
Li et al. (2015)
Criterion
ISI listed
college
product & perceptions
ESL
L1 EFL
EFL EFL
School of Social Work, Jackson State University, Mississippi Foreign Language Center, Tokai University, Japan; The Center for Research And Development of Educational Technology, Tokyo Institute of Technology, Japan The Faculty of the Curry School of Education, University of Virginia School of Education, Damanhour, Alexandria, Egypt Department of Applied Foreign Languages, National Formosa University, Taiwan Faculty of Education, National Cheng Kung University, Taiwan Department of Writing and Linguistics, Georgia Southern University; Department of English, Iowa State University
M. Stevenson / Computers and Composition 42 (2016) 1–16
Study
5
6
M. Stevenson / Computers and Composition 42 (2016) 1–16
the most common disciplinary background, and some of the authors from these backgrounds have been involved in the development of the AWE programs they are researching or work for organizations that have developed these programs. For instance, the LSA group developed Summary Street. However, it should be stressed that many of the studies in the sample do not appear to have been carried out by those who develop or market the programs. The table also shows that seven of the studies have some explicit focus on school or classroom use in their data collection. Many of the studies focus primarily on the effects of AWE on writing outcomes such as scores or error rates, and others focus on attitudes to AWE. The table shows that most of the studies have been carried out in college or high school contexts. Only two studies have been carried out in elementary schools, and only one has been carried out at a private language centre. The table shows that some of the studies (n=12) were carried out in mainstream classrooms, with the language backgrounds of the students sometimes not specified, whereas other studies (n=9) were carried out in EFL or ESL classrooms. A number of the studies do not specify the language background of the learners. 4.3. Development of synthetic constructs The studies were read several times to identify concepts, themes and ideas used by the authors that related to integration of AWE in the classroom. In CIS, these concepts, themes and ideas are referred to as ‘translations’. Translations were identified in the findings, discussion and conclusions sections of the studies. Translations could be either direct (i.e. perceptions, processes, and observations), that is, directly based on research, or indirect (i.e. interpretations, evaluations and recommendations). Perceptions are findings that come from questionnaires or interviews conducted with teachers or students. Processes are findings on students writing processes such as the revisions they have carried out or the amount of time they spent writing. Observations are based on informal observation of teachers and/or students. Interpretations are interpretations made about AWE by the author based on their own findings; evaluations are evaluative comments about the use of AWE in the classroom made by the author(s); and recommendations are recommendations made by the author(s) on the use of AWE in the classroom. These translations were identified using electronic sticky notes in the pdf files. In total, 203 translations were identified. These translations were transcribed and cut and paste into an SPSS file, in which a record was kept which study each translation came from. The translations were then synthesized into what are known as ‘synthetizing constructs’. According to Fleming (2009), synthetic constructs “represent an interpretation of the whole body of evidence and allow contrasting aspects of a phenomenon to be unified and explained” (p. 205). The concepts, themes and ideas identified in each study were compared to determine whether there were commonalities. Through the process of identifying commonalities, three overarching constructs emerged, and each translation was coded in SPSS in terms of the overarching construct under which it fell. The three overarching constructs were: Purpose, Action and Use. Purpose refers to objectives or reasons for integrating AWE in the classroom. Action refers to the specific strategies used by teachers and learners to integrate AWE in the classroom. Use refers to issues surrounding the use of AWE in the classroom by teachers and learners. 4.4. Developing synthetic arguments Within each of the three synthetic constructs, through ongoing examination and a system of color coding, subconstructs emerged from the data. These are described and discussed in the results section. These sub-constructs were used to identify the main patterns in the data and to form synthesizing arguments. Synthesizing arguments are higher order interpretations of the data that draw out the most salient point for each of the constructs. They can be seen as representing the main findings of the study. One synthesizing argument per construct was identified. 5. Results Table 2 shows the number of contributing translations and the number of contributing publications for each of the three synthetic constructs: Purpose, Action and Use. As can be seen from the table, issues surrounding the use of AWE had the most constructs, followed by specific actions, followed by reasons for using AWE in the classroom.
M. Stevenson / Computers and Composition 42 (2016) 1–16
7
Table 2 Synthetic constructs. Synthetic construct
Number of contributing translations
Number of contributing publications
Purpose Action Use
46 64 93
14 14 14
5.1. Purpose The synthesizing argument for Purpose is that there are numerous purposes for using AWE stated in existing research, some of which do not accord with objectives that are commonly associated with AWE. Table 3 shows that the most frequent purposes were saving teacher time, promoting autonomy and developing writing processes, all of which do relate to commonly mentioned objectives of AWE. However, other less frequently mentioned purposes such as promoting social interaction and developing content knowledge are not commonly associated with AWE. The goals of saving teachers’ time and promoting autonomy are interrelated in the sense that the more students work on their own, the more time that is freed up for the teacher. Saving teacher time was mentioned repeatedly in connection with AWE assisting students in carrying out lower level revisions (e.g., grammar, spelling and punctuation), thus freeing teachers to concentrate on higher level aspects of writing, such as meaning and content, organization, and critical thinking. Chen and Cheng (2008) examined the implementation of MY Access! in three Taiwanese English business writing classrooms. They found that AWE can be used effectively by students in the initial stages of writing to work on their grammatical accuracy, with teacher feedback being provided in the later stages of writing. In a study of the use of Criterion in two ESL writing courses in a US university, Li, Link, & Hegelheimer (2015) found that all four teachers who participated in the study reported that using AWE freed them up to focus on aspects of writing other than grammar. Students were less positive in their attitudes, but also commented that they could use AWE for grammar errors, although they needed the teacher’s help for content errors. In a study of the use of Criterion in a criminal justice writing course by Schroeder, Grohe, & Pogue (2008), teachers felt that AWE was effective at helping students identify sentence level errors, freeing up teachers to focus on writing skills that promote critical thinking. All in all, although the studies did tend to characterize AWE as primarily being a sophisticated kind of text editor, they also suggested that AWE, albeit indirectly, was able to contribute to the development of writing skills, by enabling the teacher to concentrate on other higher level aspects of writing. AWE was found to promote autonomy because the in-built writing tools allowed students to work on their own. For example, in a study of Summary Street in elementary school classes, Steinhart (2001) found that the immediate feedback provided by AWE gave students a great deal of independent practice in summary writing, without teachers needing to provide so much input. In a study of My Access! in a middle school in the US, Grimes and Warschauer (2010) found that AWE promoted autonomy by allowing teachers to fulfill a supportive coaching role, rather than a more prescriptive and adversial “teacher-as-judge” role. They also commented that the potential for AWE to facilitate a shift to a “teacher-as-coach” role, appear to call into question criticisms of AWE as being non-social or dehumanizing. Table 3 Purposes. Sub-constructs
Number of contributing translations
Publication numbers (See Table 1)
saving teacher time promoting autonomy developing writing processes awareness-raising competing to improve scores exam preparation creating practice opportunities promoting social interaction diagnosis of writing problems developing conceptual knowledge
9 8 6 5 5 3 4 3 2 1
7,8,11,15,21 1,2,5,7,8 2,7,17,18,21 2, 7, 21 8,9,19 4,11,10 2,3,11, 19 2,7,13 2 2
8
M. Stevenson / Computers and Composition 42 (2016) 1–16
Grimes and Warschauer make the point that whether AWE dehumanizes of humanizes writing depends on how it is used in the classroom. Regarding writing processes, surprisingly, only a couple of studies mention purposes that relate to revising processes. It seems that there has been relatively little emphasis on using AWE as a tool to actually teach students how to revise. The emphasis seems to have been on revising in order to improve the correctness of particular texts rather than in order to develop generalized revising skills that could be subsequently applied to other texts, and which could enable students to revise their own texts increasingly independently. The only detailed reference to the development of students’ revising skills is made by David Wade-Stein and Kintsch (2004) in a study of Summary Street in elementary schools. WadeStein and Kintsch (2004) state that “rather than trying to keep the learner on an error-free learning path, Summary Street aims to provide just enough guidance to help learners debug and revise problems with their comprehension and writing on their own” p.358. They point out that Summary Street intentionally does not provide students with correct answers, so that students can learn to revise independently. A couple of studies speak of purposes of AWE that relate to support for the whole writing process. El Ebyary and Windeatt (2010) examined the use of Criterion in an Egyptian college EFL setting. They point out that AWE encourages students to reflect generally on their writing, and also that it helps students learn to plan. Similarly, Choi (2010), in a study of Criterion in both an ESL and an EFL context, emphasized AWE’s role in facilitating the process approach to writing. Choi stated that AWE assisted students in engaging in the writing sub-processes of planning, drafting and revising by continually reminding them of these processes. Especially, the ESL instructors used the Criterion planning tool in class and asked students submit outlines for writing. However, Chen and Cheng (2008) counter the notion that AWE supports the process approach, instead expressing a view that the ideology underlying AWE programs is one that views writing, and also learning and teaching, as formulaic. There seemed to be more consensus regarding the capacity of AWE to raise meta-awareness. For example, Chen and Cheng (2008) said that AWE can raise awareness of conventions and mechanics, and Li et al. (2015) made the point that AWE can raise awareness of metalanguage. Steinhart (2001) made the somewhat unusual point that flaws in some of the feedback provide students with opportunity to evaluate the appropriacy of AWE feedback. Although AWE has been criticized for dehumanizing writing, a few studies have discussed the feasibility of using AWE to promote social interaction in the classroom. Kellogg, Whiteford, & Quinlan (2010), in a study of Criterion in US college classrooms, recommended the use of AWE systems for peer review, as the web-based nature of the sites provides a convenient means of linking readers and writers. Chen and Cheng (2008) described a successful example of classroom integration of AWE in which the teacher combined AWE with peer review. Wade-Stein and Kintsch (2004) pointed out that Summary Street works equally well as an individual tutor or in a collaborative setting. They also commented that the potential for AWE to facilitate a shift to a “teacher-as-coach” role appears to call into question criticisms of AWE as being non-social or dehumanizing. Grimes and Warschauer (2010) made the point that whether AWE dehumanizes of humanizes depends on how it is used in the classroom. Lastly, an unusual purpose that was mentioned was the use of AWE to deepen students’ conceptual knowledge. Wade-Stein and Kintsch (2004) evaluated Summary Street as fostering deeper learning of content. Summary Street focuses on developing students’ ability to produce summaries of texts. The underlying idea is that summarizing promotes deeper understanding of the text. 5.2. Action The synthesizing argument for Action is that AWE teachers had varied and creative ways of integrating AWE in their classrooms. However, it should be stressed that by no means all the studies specified how AWE was used in the classroom, whether scaffolding by teacher was provided, or the manner in which any scaffolding was provided. In general, studies that looked specifically at classroom use provided more information about the integration of AWE than did studies whose objective was to examine the effects of AWE on text quality by, for example, comparing an AWE condition with a teacher feedback condition or a no-feedback condition (See Table 1 for an overview of the studies focus and design). Table 4 shows the actions carried out by the teachers in the sample research to integrate AWE into their classrooms. The most frequent kind of action was augmenting AWE feedback, in which AWE feedback and teacher feedback are combined. Teachers had diverse ways of combining computer and human feedback: students could keep on submitting work to AWE until they reached a threshold score set by then teacher, then subsequently submit to the teacher for
M. Stevenson / Computers and Composition 42 (2016) 1–16
9
Table 4 Actions. Synthetic construct
Number of contributing translations
Publication numbers (See Table 1)
augmenting teacher feedback scaffolding AWE embedding in classroom instruction Assessment Collaboration exam preparation
22 16 12 10 3 1
2, 6, 7, 11, 13, 15, 17,19,21 5, 8, 9, 11, 21 1,2, 5, 9, 11, 12, 17,21 7, 8, 9, 11 7, 9, 17 7
feedback (Chen & Cheng, 2008); students could receive both AWE feedback and comments inserted by the teacher through the AWE system (Grimes & Warschauer, 2010); teachers could review AWE feedback and then make their own modifications or additions to this feedback (Choi, 2010). Opinions seemed to be unanimous that AWE feedback needs to be used to augment human feedback, rather than as a replacement for human feedback. However, ironically, some of the between group comparison studies investigated the effects of AWE without the inclusion of any teacher feedback. Chen and Cheng (2008) found that there was a relationship between students’ perceptions of AWE feedback and the manner in which it is integrated into classroom instruction. They compared the attitudes of students to AWE feedback in three writing classrooms with different instructional approaches. In two of the classrooms, AWE was used formatively as an instructional tool in the initial drafting and revising phases. In one of these classrooms it was coupled with both teacher and peer feedback in later phases and in the other it was coupled with only teacher feedback. In the third classroom, it was used both formatively and summatively, with students using the program during the whole writing process and with AWE scores accounting for 40% of final grades. It was found that attitudes were most positive when AWE was integrated used in the initial phases only and was coupled with both teacher and peer feedback. Actions taken to scaffold students’ use of AWE were the second most common type of action, although only described in five of the studies. Many of the reported actions appeared to be directed towards compensating for limitations of AWE. These actions ranged from giving guidance on how to interpret and use AWE feedback (Dikli, 2008), to turning off analytic scoring categories that did not appear to be accurate (Grimes & Warschauer, 2010), to reminding students that computer-generated feedback is not always accurate (Grimes & Warschauer, 2010), to encouraging students to write a nonsensical essay and allowing AWE to score it (Grimes, 2008). The third most common category was embedding AWE in classroom instruction. Choi (2010) recommended that embedding should be based on learning theories, and, as mentioned, specifically suggested that AWE lends itself to integration into the process approach. This idea was echoed in some of the other studies. For instance, Steinhart (2001) suggested that students be taught by the teacher to revise and then be given unlimited access to AWE. In a study on My Access! in US middle school classrooms, Grimes (2008) emphasized the role of pre-writing, peer discussion, revision plans as part of the use of AWE. The fourth most common category was assessment. It appears to have rarely happened that teachers used AWE as the sole means of assessing students’ writing. They generally either elected not to use it as part of the formal assessment or combined it with their own grading. For example, one teacher used AWE every second week to grade students’ writing (Warschauer & Grimes, 2008). Interestingly, given that the Intellimetric scoring engine used in MY Access! is also used in testing situations, according to Grimes (2008), the MY Access! trainer suggested that no weight at all should be given to MY Access! scores in grading students’ writing. Only one study, Chen and Cheng (2008), mentioned a teacher using AWE for exam preparation to give students timed writing experience. This teacher was planning to use the AWE scores for both formative and summative assessment. However, the students in this class requested the teacher not to include the AWE scores in their final grade, as they had little confidence in the accuracy of the scores, and the teacher granted their request. 5.3. Use The synthesizing argument for Use is that, although students generally seemed to enjoy using AWE, at the time of writing, there appeared to be many limitations in the feedback provided by AWE systems. Students and teachers
10
M. Stevenson / Computers and Composition 42 (2016) 1–16
Table 5 Use. Synthetic construct
Number of contributing translations
Publication numbers (See Table 1)
feedback scoring technical issues lack of revision surface level revision narrow range of prompts lack of time lack of social interaction
36 18 12 11 8 3 3 2
1,2,5,7, 8,10,11,14, 17,20, 21 7, 8,19,21 6,8, 9, 10, 14,17 1, 2, 6, 11, 14, 19,21 5, 8, 11, 15 11, 12, 15 8, 11 5,6
were generally more positive about error feedback provided by AWE than about other aspects of feedback or about scoring. Table 5 shows that the main issues surrounding the use of AWE were difficulties understanding comments, reported inaccuracies in scoring, technical problems, lack of revising by students, and influence on revising behaviors. In particular, the written feedback provided by AWE was reported as being difficult to understand (e.g. Warschauer & Grimes, 2008; Frost, 2008), especially for non-native speakers (e.g. Hoon, 2006; Dikli, 2008). For example, the teacher in Dikli (2008) commented that the program appeared to have been designed for native speakers, and that grammatical and usage errors were very difficult to understand. In addition, the generic nature of some of the content feedback made it difficult to follow (e.g. Dikli, 2008; Warschauer & Grimes, 2008). According to Warschauer and Grimes (2008), few of the middle school students in their study took the time to read through the generic feedback on content and organization, and those who did either failed to understand it or failed to act upon it. Moreover, teachers in Li et al. (2015) reported that the generic content feedback provided by Criterion did not provide meaningful support, but instead encouraged fixed writing patterns that did not allow for creativity. Only one study (Wade-Stein & Kintsch, 2004), reported that content feedback was easy for students to understand. This report was about Summary Street, which provides content feedback in graphic form. A variety of comments were made about the scoring: that it was too sensitive to cues such as transition words, and that longer texts were more likely to receive higher scores (Chen & Cheng, 2008). Grimes and Warschauer (2010) found that analytic scores were less accurate than holistic scores, that teachers and students believed that a five-paragraph formula would produce a higher score, and that older, more competent students quickly learnt how to trick the scoring engine. In a study of My Access! in a Taiwanese college EFL setting, Fang (2010) reported that less than half the students were satisfied with AWE as an essay grader. It is unclear to what extent AWE stimulated student revision. In a study of Criterion in a US high school setting, Frost (2008) reported that two thirds of students engaged in redrafting. However, some other studies reported that students carried out little revision. Warschauer and Grimes (2008) pointed out the contradiction that, while all the teachers said that AWE promoted student revision, the data showed that most students submitted texts for scores only once, meaning they did not use the revision function. In a comparison of AWE feedback and peer feedback in a Taiwanese EFL setting, Lai (2010) found that students revised more with peer feedback than with AWE. However, it is unclear whether the lack of revision reported in some studies was linked to difficulties in understanding and using the written feedback or whether it relates to the fact that writers in general, particularly novice writers, do not revise much (Faigley & Witte, 1981; Whalen & Ménard, 1995). Steinhart (2001) commented that students did not revise much because they did not know how to revise. Li et al. (2015) found that students were more likely to resubmit texts if AWE was consistently integrated into classroom writing instruction. In terms of the kinds of revisions that were carried out, these seem to have largely involved the correction of surface-level features such as language errors (Dikli, 2008, Schroeder et al., 2008, Warschauer & Grimes, 2008; Chen & Cheng, 2008), which is not surprising given that this is a major focus in AWE feedback. Warschauer and Grimes (2008) commented that using AWE did not seem to result in stilted writing through error avoidance behaviors. Only two comments were made about lack of social interaction as a drawback of AWE (Dikli, 2008; Lai, 2010). Dikli (2008) compared the revising behaviors of two ESL students who received only received AWE feedback with two ESL students who only received teacher feedback. Both students who only received AWE feedback complained that
M. Stevenson / Computers and Composition 42 (2016) 1–16
11
the interactional element with the teacher was missing. Lai (2010) found that student writers preferred peer evaluation to AWE, because it helped foster interaction and co-construction of knowledge. 6. Discussion and conclusions As outlined in the introduction and literature review, AWE has been criticized by proponents of new literacies for doing no more than reinforcing or replicating worn out, unimaginative, objectivistic teaching practices. Vojak et al. (2011) examined the scoring and feedback provided by AWE systems by trying out these systems themselves and concluded that these systems generally failed to reflect social, meaning-oriented and multi-modal aspects of writing. The current study has synthesized the findings and interpretations of research that examined, or included some examination of, the actual classroom use of AWE systems. The three synthesizing arguments that have emerged from this study are: that numerous possible pedagogical purposes for AWE are suggested by existing research, some of which do not accord with objectives that are commonly associated with AWE; that teachers had varied and creative ways of integrating AWE in their classrooms; and that, although students generally seemed to enjoy using AWE, at the moments when the sample studies were conducted, there appeared to be many limitations in the feedback provided by AWE systems. The findings embodied in these synthesizing arguments certainly appear to corroborate previous criticisms relating to the accuracy and efficacy of AWE scoring and written feedback. However, the findings also provide a more nuanced picture of AWE and its potential in actual classrooms by illustrating that AWE can be integrated into classroom teaching and learning in a variety of ways, that these ways reflect diverse learning purposes, and that some of these ways enable AWE to be embedded in social and meaning-oriented writing instruction. The integration of AWE into the classroom does not take place in a social vacuum. Just as any classroom writing can be conducted either individually or a collaboratively, so writing with AWE also lends itself to the incorporation of different interactional patterns and social purposes. There was evidence that teachers embedded AWE in teacher interaction and instruction: in some cases, teachers also provided feedback themselves, and had creative ways of combining AWE feedback with teacher feedback: teachers scaffolded the use of AWE; teachers embedded AWE into instruction such as process writing; and there were instances of teachers using AWE for collaborative writing and for peer feedback. There were also very few cases of teachers using AWE as exam preparation. Hence, it could be argued that the dangers of not writing for a human audience may have been exaggerated in the criticisms of AWE, at least in the use of AWE in classroom settings, as students are in effect still writing for the same audiences they generally write for: their teachers or their peers. However, the point needs to be made that it should not be left solely up to teachers to orchestrate interactions, and devise social purposes that AWE systems do not themselves incorporate. AWE systems are in the process of transitioning from being summative assessment tools to formative instructional tools, and an important priority in their ongoing development needs to be the incorporation of a more social constructivist view of learning by including affordances such as opportunities for peer feedback and teacher/student dialogue. A more difficult issue for program developers to address is limitations in the ability of AWE to respond to many of the meaning-making resources inherent in writing. Firstly, the findings of the current study largely corroborate previous criticisms concerning lack of clarity in the written feedback and inaccuracies or anomalies in the scoring. Recurring perceptions and observations in these studies that AWE scoring is not always accurate appear to be at loggerheads with findings from validation research that it compares favorably with human scoring. In fact, AWE classroom use seems to be the story of juggling with constraints in the capabilities of AWE present at the time the sample studies were carried out. The ability of AWE to provide accurate scoring and feedback, particularly in relation to content, seems to be very much tied to these constraints. However, lack of clarity in the expression of feedback is a point that can and should be addressed, as solving this issue is not dependent on the computational capabilities of AWE systems. Program developers need to keep in mind that many of the users of AWE systems are young and/or of non-English speaking background, so clarity and simplicity are of the essence. Secondly, as explained in the introduction, AWE only models a small part of the writing construct, which was reflected in the fact that a number of the teachers confined the use of AWE to error correction in the initial phases of writing. Also, the range of genres for which AWE can provide feedback is quite narrow (Stevenson & Phakiti, 2014). The programs included in the current study are largely confined to traditional essay-based genres, and as pointed out by Vojak et al. (2011), do not yet have the ability to deal with multi-modal genres, or indeed to deal with a wide range of non-digital genres. Although advances are being made, such as initial developments to build a system that recognizes sentiment and polarity (Burstein, Beigman-Klebanov, Madnani, & Faulkner, 2013b), the accurate simulation of all
12
M. Stevenson / Computers and Composition 42 (2016) 1–16
aspects of human feedback and the incorporation of a comprehensive range of genres still seems to be a long way off. As Kellogg et al. (2010) pointed out, “the development of AWE capabilities is limited by the state-of-the-art in computational linguistic techniques. Clearly, this involves a trade-off between the ideal and the possible (p. 189)”. Consequently, educators need to be aware of limitations in the meaning-making potential of AWE systems and to weigh these up carefully in deciding whether to employ AWE in writing classrooms for instructional purposes. The current study has shown that teachers did appear to be able to integrate AWE into classroom instruction in ways that circumvented some of its limitations, such as using AWE to provide error correction in the initial phases of writing a text, freeing the teacher up to concentrate on higher-level meaning-oriented, genre-oriented and audience-oriented aspects of writing, or using AWE to increase critical awareness of what writing feedback involves. Indeed, this former use of AWE is promoted on the Criterion website as one of the benefits of AWE (http://www.ets.org/criterion/about?WT.ac=criterion 22703 ell about 120819). As such, this is a legitimate – albeit limited - use of technology to develop an aspect of traditional print literacy. Some of the past criticisms of AWE seem to be based on the flawed premise that new forms of technology are only of value insofar as they are directed towards new forms of literacy, and that there is no longer a place for technology to be used in the development of traditional print literacy. However, as Labbo and Reinking (1999) point out, the integration of technology involves the negotiation of multiple realities: technology can and should be used to enhance the goals of traditional literacy instruction, as well as to transform literacy instruction and to prepare students for the literacy of the future. The relationship between literacy and technology is complex and multi-dimensional: technology shapes new literacies, the ability to use technology is itself a form of literacy, and literacy itself shapes technology. AWE embodies the complexity of this relationship, and criticisms of AWE highlight the tension, described by Labbo (2006), between the push forwards of new digital literacies and the pull backwards of traditional print-based literacy. As an instructional tool for the development of traditional print literacy, AWE is an example of how a need to provide students’ with scores and feedback on their writing has driven the development of a technology. Although still in its infancy, AWE also has the potential embed new literacies and to incorporate a more socially-oriented vision of writing, and it is to be hoped that the future will see this come to fruition. In addition, AWE itself requires specific kinds of technological literacy needed to produce computer-written texts, to interact effectively with the interface, and most importantly of all, AWE requires the ability to understand, interpret, and use on-line feedback – in both numeric and written form - generated by a computer. It could even be argued that receiving feedback from a computer rather than a human being is itself a new literacy. Eliot and Klobucar (2013) write of AWE that “to work with students in digital writing is to teach them to build multi-modal worlds” (p. 29). Perhaps, ultimately, the distinction between technology that is directed towards new literacies and technology for the development of traditional print literacy is one that will blur. Surprisingly few criticisms have been leveled against the ability of AWE to develop students’ revising skills, which surely should be a central objective of programs that provide students with multiples drafting opportunities and detailed automated feedback. Yet the findings of this study pointed to lack of revision by some students, few actions being taken by teachers related to using AWE to develop students’ revision skills and, amazingly, even little acknowledgement of the development of revision skills as being a possible or desirable objective. As mentioned, lack of revising is not necessarily indicative of shortcomings in AWE, as research shows that novice writers do not readily revise. However, these findings do indicate that further thought is needed about how AWE can stimulate students to revise more and – most importantly - to develop their revision skills in a way that ultimately allows them to revise more independently. A clearer theoretical framework is needed that takes account of both the cognitive and social nature of revision processes, and that articulates the possible influence of AWE on these processes. It would be of benefit for AWE development and research to ally itself more closely with research on writing feedback, writing processes, and, in particular, on the mechanisms underlying revision as a writing sub-process. Promisingly, Deane (2013) suggests that AWE be based on a socio-cognitive framework that pays attention to both the skills, strategies and knowledge associated with writing, as well as the social contexts and purposes. Deane also suggests incorporating information about writing processes and strategies in AWE systems. For example, different kinds of feedback could be provided during different stages of the writing process, or differentiated feedback associated with the specific genre. If AWE is to realize its potential as an instructional tool in classroom writing instruction it needs to continue its transition from being a sophisticated text editor that assists students in correcting their errors to a tutoring system that assists student in developing the skills and strategies needed to effectively independently revise their texts. As revising is essentially a problem-solving behavior, in order to become an effective tool for revision it seems that AWE may need to move further along the continuum of instructional tools from drill-and-practice and tutorial towards becoming a
M. Stevenson / Computers and Composition 42 (2016) 1–16
13
problem-solving instructional tool that is firmly grounded in theoretical and pedagogical principles underlying revising as both a cognitive and a social process. The findings indicate the central role played by teachers in the integration of AWE in the classroom. Clearly, it is not sufficient to evaluate the properties of AWE systems, or to consider the effects of AWE systems on students’ writing without considering their contexts of use. Even research that is primarily concerned with the effects of AWE on text quality, in order to have ecological validity, needs to take care to consider and to describe in its methodology the manner in which AWE is embedded into classroom writing instruction. As pointed out by Warschauer and Ware (2006) and Cotos (2014), further research is also much needed in which the major focus is the manner in which AWE is used in the classroom, and on the effects that different methods of integration have on students’ writing and writing processes. AWE, with its origins as an assessment tool for high-stakes testing, may still have the mark of objectivism emblazoned on it, but closer examination of contexts of use in order to embed AWE more firmly in pedagogically sound writing instruction can create an opportunity for AWE to find its balance somewhere between the push and pull of new and old literacies. Appendix A. APPENDIX 1 MY Access! MY Access! is a web-based instructional program developed by Vantage Learning that uses the Intellimetric scoring engine. Intellimetric was the first AWE program to use artificial intelligence (AI) blended with NLP and statistical technologies. IntelliMetric can provide both analytical and holistic scores on focus/coherence, organization, development/elaboration, sentence structure, and mechanics/conventions to attain a final score. It has been developed for students at elementary, secondary and college level. It provides holistic scores on a 4 or 6 point scale as well as analytical scores and written feedback in the areas of focus and meaning, content and development, organization, language use and style; and mechanics and conventions. More information is available at the MY Access! website: http://www.vantagelearning.com/school/products/myaccess/features.html Criterion Criterion is a web-based service developed by ETS that uses the E-rater scoring engine, and a diagnostic feedback application. E-rater uses a combination of statistical and natural language processing (NLP) techniques to identify relevant features in a sample of human-scored essays. Criterion provides a holistic score out of six. It provides general written feedback related to the specific score and trait feedback on organization and development, style, mechanics, usage and grammar. For mechanics, usage and grammar, feedback involves the identification of specific errors in the text. For organization and development and style feedback takes the form of comments that point to the presence or absence of certain features in the text. More information can be obtained at: http://www.ets.org/portal/site/ets/ menuitem.435c0b5cc7bd0ae7015d9510c3921509/?vgnextoid=b47d253b164f4010VgnVCM10000022f95190RCRD Summary Street Summary Street is a Web-based reading comprehension and writing instruction tool developed by Pearson Knowledge Analysis Technologies, which uses the Knowledge Analysis Technologies (KAT) scoring engine. The KAT scoring engine uses latent semantic analysis, an approach in which the semantic similarity between words in a particular text and words in a corpus of representative texts is compared. In Summary Street, students use their own words to write summaries of online academic texts. The underlying principle is that summarization develops both understanding of texts and writing skills. Summary Street compares student summaries to the original text and provides feedback about the content coverage on a section-by-section basis. It provides a general comment on the global quality of the summary and also provides specific comments on the quality of each section. In addition, it provides feedback on spelling, redundancy, irrelevant sentences and copying from the original text. More information can be found at: http://kt.pearsonassessments.com/ Appendix B. APPENDIX 2 The following means of identifying research were used: a) Search engines: Google Scholar, Google
14
M. Stevenson / Computers and Composition 42 (2016) 1–16
b) Databases: ERIC, MLA, PsychInfo, SSCI, MLA, Ovid, PubPsych, Linguistics and Language Behavior Abstracts (LLBA), Dissertation Abstracts International, Academic Search Elite, Expanded Academic, ProQuest Dissertation and Theses Full-text, and Australian Education Index c) Search terms used: automated writing evaluation, automated writing feedback, computer-generated feedback, computer feedback, and automated essay scoring automated evaluation, electronic feedback, and program names (e.g., Criterion, Summary Street, Intelligent Essay Assessor, Write to Learn, MY Access!!) d) Websites: ETS website (ets.org) (ETS Research Reports, TOEFL iBT Insight series, TOEFL iBT research series, TOEFL Research Reports); AWE software websites e) Journals from 1990-2013: CAELL Journal; CALICO Journal; College English; English Journal; Computer Assisted Language Learning; Computers and Composition; Educational Technology Research and Development; English for Specific Purposes; IEEE Intelligent Systems; Journal of Basic Writing; Journal of Computer-Based Instruction; Journal of Educational Computing Research; Journal of Research on Technology in Education; Journal of Second Language Writing; Journal of Technology; Journal of Technology, Learning and Assessment, Language Learning and Technology; Language Learning; Language Teaching Research; Learning, and Assessment; ReCALL; System; TESL-EJ. f) Reference lists of already identified publications. In particular, the Freitag Ericsson and Haswell (2006) bibliography, and Elliot et al. (2013). Marie Stevenson is an senior lecturer at University of Sydney, Australia. Her research interests include writing, reading, literacy, and discourse, particularly in relation to second language learners. She has published widely in these areas. She also trains teachers at postgraduate level to teach English as a second language.
References (Sample studies) Chen, Chi-Fen, & Cheng, Wei-Yuan. (2008). Beyond the design of automated writing evaluation: pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning and Technology, 12(2), 94–112. Choi, Jaeho. (2010). The impact of automated essay scoring (AES) for improving English language learners’ essay writing. Doctoral dissertation. University of Virginia, 2010. Dikli, Semire. (2007). Automated essay scoring in an ESL setting. Doctoral dissertation, Florida State University, 2007. El Ebyary, Khaled, & Windeatt, Scott. (2010). The Impact of Computer-Based Feedback on Students’ Written Work. International Journal of English Studies, 10(2), 121–142. Elliot, Scott, & Mikulas, Cathy. (2004). The impact of MY Access!! ! Use on student writing performance: a technology overview and four studies. In Paper presented at the Annual Meeting of the American Educational Research Association Fang, Yuehchiu. (2010). Perceptions of the Computer-Assisted Writing Program among EFL College Learners. Educational Technology & Society, 13(3), 246–256. Franzke, Marita, Kintsch, Eileen, Caccamise, Donna, & Johnson, Nina. (2005). Summary Street: Computer support for comprehension and writing. Journal of Educational Computing Research, 33(1), 53–80. Frost, Kathie. (2008). The effects of automated essay scoring as a high school classroom Intervention, PhD. Thesis. Las Vegas: University of Nevada. Grimes, Douglas. (2008). Middle school use of automated writing evaluation: A multi-site case study, PhD thesis. Irvine: University of California. Grimes, Douglas, & Warschauer, Mark. (2010). Utility in a fallible tool: A multi-site case study of automated writing evaluation. Journal of Technology, Language, and Assessment, 8(6), 1–43. Hoon, Tan Bee. (2006). Online Automated Essay Assessment: Potentials for Writing Development. Retrieved August 9, 2006 from http://ausweb.scu.edu.au/aw06/papers/refereed/tan3/paper.html. Kellogg, Ronald, Whiteford, Alison, & Quinlan, Thomas. (2010). Does automated feedback help students learn to write? Journal of Educational Computing Research, 42, 173–196. Li, Jinrong, Link, Stephanie, & Hegelheimer, Volker. (2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–18. Lai, Yi-Hsiu. (2010). Which do students prefer to evaluate their essays: Peers or computer program. British Journal of Educational Technology, 41(3), 432–454. Matsumoto, Kahoto, & Akahori, Kanji. (2008). Evaluation of the Use of Automated Writing Assessment Software. In C. Bonk, C. Bonk, et al. (Eds.), Proceedings of World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2008 (pp. 1827–1832). Chesapeake, VA: AACE. Rock, JoAnn. (2007). The impact of short-term use of Criterion on writing skills in 9th grade (Research Report RR-07-07). Princeton, NJ: Educational Testing Service. Schroeder, Julie, Grohe, Bonnie, & Pogue, Rene. (2008). The impact of criterion writing evaluation technology on criminal justice student writing skills. Journal of Criminal Justice Education, 19(3), 432–445.
M. Stevenson / Computers and Composition 42 (2016) 1–16
15
Steinhart, David. (2001). An intelligent tutoring system for improving student writing through the use of latent semantic analysis. Boulder: University of Colorado. Tsou, Wenli. (2008). The effect of a web-based writing program in college English writing classes. Washington, DC, USA.: IEEE Computer Society. Retrieved on December, 16, 2014 from: http://portal.acm.org/citation.cfm?id=1381740. Wade-Stein, David, & Kintsch, Eileen. (2004). Summary Street: Interactive Computer Support for Writing. Cognition and Instruction, 22(3), 333–362. Warschauer, Mark., & Grimes, Douglas. (2008). Automated writing assessment in the classroom. Pedagogies: An international journal, 3, 22–36.
Further reading (Other References) Buckingham, David. (1993). Towards new literacies, information technology. English and media education. The English and Media Magazine, Summer, 20–25. Burstein, Jill, Tetreault, Joel, Chodorow, Martin, Blanchard, Daniel, & Andreyev, Slava. (2013). Automated evaluation of discourse coherence quality in essay writing. In Mark Shermis, & Jill Burstein (Eds.), Handbook of Automated Essay Evaluation: Current applications and new directions. Routledge: New York and London. Burstein, Jill, Beigman-Klebanov, Beata, Madnani, Nitin, & Faulkner, Adam. (2013). Automated sentiment analysis for essay evaluation. In M. D. Shermis, & J. Burstein (Eds.), Handbook of Automated Essay Evaluation: Current applications and new directions. New York and London: Routledge. Burstein, Jill, Chodorow, Martin, & Leacock, Claudia. (2004). Automated essay evaluation: The Criterion online writing service. AI Magazine (Fall), 27–36. Conference on College Composition and Communication. (2004). Position statement on teaching, learning and assessing writing in digital environments. Retrieved from http://www.ncte.org/cccc/resources/positions/digitalenvironments. Conference on College Composition and Communication. (2009). Writing assessment: A position statement. Retrieved from http://www.ncte.org/cccc/resources/positions/writingassessment. Cope, Bill, & Kalantzis, Mary. (2007). New media, new learning. The International Journal of Learning, 14(1), 75–79. Cotos, Elena. (2014). Genre-based Automated Writing Evaluation for L2 Research Writing: From Design to Evaluation and Enhancement. Palgrave Macmillan. Deane, Paul. (2013). Covering the construct: An approach to automated essay scoring motivated by a socio-cognitive framework for defining literacy skills. In M. D. Shermis, & J. Burstein (Eds.), Handbook of Automated Essay Evaluation: Current applications and new directions. New York and London: Routledge. Elliot, Norbert, & Klobucar, Andrew. (2013). Automated essay evaluation and the teaching of writing. In Mark. Shermis, & Jill. Burstein (Eds.), (2013). Handbook of Automated Essay Evaluation: Current applications and new directions. New York and London: Routledge. Elliot, Norbert, Ruggles Gere, Anne, Gibson, Gail, Toth, Christie, Whithaus, Carl, & Presswood, Amanda. (2013). Uses and Limitations of Automated Writing Evaluation Software, WPA-CompPile Research Bibliographies No. 23. Elliot, Scott, & Mikulas, Cathy. (2004). The impact of MY Access!! ! Use on student writing performance: a technology overview and four studies. In Paper presented at the Annual Meeting of the American Educational Research Association. Faigley, Lester, & Witte, Stephen. (1981). Analyzing revision. College Composition and Communication, 32, 400–414. Fleming, Kate. (2009). Synthesis of quantitative and qualitative research: an example using Critical Interpretative Synthesis. Journal of Advanced Nursing, 66(1), 201–217. Fleming, Kate. (2010). The use of morphine to treat cancer-related pain: A synthesis of quantitative and qualitative research. Journal of Pain and Symptom Management, 39(1), 139–154. Freitag Ericsson, Patricia, & Haswell, Richard (Eds.). (2006). Machine scoring of student essays: Truth and consequences.. Logan, UT: Utah State University Press. Labbo, Linda. (2006). Literacy pedagogy and computer technologies: Toward solving the puzzle of current and future classroom practices. In Paper drawn from a presentation for The Future Directions in Literacy Conference sponsored by the Primary English Teachers Association (PETA) and Australian Literacy Educators’ Association (ALEA), Sydney, Australia. Labbo, Linda, & Reinking, David. (1999). Theory and Research into Practice: Negotiating the Multiple Realities of Technology in Literacy Research and Instruction. Reading Research Quarterly, Vol. 34(4), 478–492. Lankshear, Colin, & Knobel, Michele. (2006). New literacies: Everyday Practices and Classroom Learning (2nd edition). Maidenhead: Open University Press. Luke, Allan, & Luke, Carmen. (2001). Adolescence lost/childhood regained: on early intervention and the emergence of the techno-subject. Journal of Early Childhood Literacy, 1(1), 91–120. Shermis, Mark, & Burstein, Jill. (Eds.). (2013). Handbook of Automated Essay Evaluation: Current applications and new directions. New York and London: Routledge. Shermis, Mark, & Burstein, Jill. (Eds.). (2003). Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Shermis, Mark, Burstein, Jill, & Bliss, Len. (2004). The impact of automated essay scoring on high stakes writing assessments. In Paper presented at the Annual Meeting of the National Council on Measurement in Education. Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65.
16
M. Stevenson / Computers and Composition 42 (2016) 1–16
Vojak, Colleen, Kline, Sarah, Cope, Bill, McCarthey, Sarah, & Kalantzis, Mary. (2011). New spaces and old places: An analysis of writing assessment software. Computers and Composition, 28, 97–111. Wang, Feiwen, & Wang, Shuwen. (2012). A Comparative Study on the Influence of Automated Evaluation System and Teacher Grading on Students’ English Writing. Procedia Engineering, 29, 993–997. Warschauer, Mark, & Ware, Paige. (2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10(2), 1–24. Whalen, Karen, & Ménard, Nathan. (1995). L1 and L2 writers’ strategic and linguistic knowledge: A model of multiple-level discourse processing. Language Learning, 44(3), 381–418.