Available online at www.sciencedirect.com
ScienceDirect Computers and Composition 54 (2019) 102518
Teaching Writing with Language Feedback Technology Fei Victor Lim a,∗ , Jean Phua b a
National Institute of Education, Nanyang Technological University, Singapore b Educational Technology Division, Ministry of Education, Singapore
Abstract Against the current backdrop of the controversies and concerns over machine scoring, this paper focuses on one specific, less controversial, aspect of how machine can be effective in improving students’ writing, that is in identifying and providing timely feedback on language accuracy to students. This paper investigates the use of a Linguistic Feedback Tool (LiFT) to identify and provide feedback of the use of grammar, spelling, and punctuation in students’ composition as well as the potential reduction in the teacher’s marking time through a study conducted in Singapore schools. Part One of the study explores the teachers and students’ reception as well as the students’ experience of using a LiFT in their compositions. Part Two of the study investigates the hypothesis that the students’ use of a LiFT to review composition drafts before submission to the teachers would reduce the teachers’ marking time. The findings indicate that both teachers and students are receptive to the use of a LiFT to improve students’ English composition and that there are time-saving from marking for the teachers. © 2019 Elsevier Inc. All rights reserved. Keywords: Automated writing evaluation; Process writing; Machine marking; Writing feedback; Educational technology
Introduction Good writing has many aspects. It usually includes a quality of content, rhetorical flair, and at a more fundamental level, an appropriate, competent, and accurate use of language. In the context of the teaching and learning of writing, the ability to write well for a student, is developed with practice and supported by meaningful feedback provided by the teachers. It can, however, be time-consuming and effort-intensive for teachers to provide detailed feedback for the many students in their classes across the aspects of content quality, rhetorical effectiveness, and accurate language use. More often than not, teachers tend to be distracted by the language errors made in the student’s compositions, and would spend much time providing feedback on these mechanical issues of writing. This may often be at the expense of focusing on the substance and stylistics in the students’ compositions, which are crucial to good writing. With the advancement in educational technology, particularly in the field of artificial intelligence, the use of technology, such as a Linguistic Feedback Tool (LiFT), to help students improve their English composition writing has gained interest and attention amongst educators and policy makers (Frost, 2008; McCurry, 2010; Wang, 2013). Along with ∗
Corresponding author. E-mail addresses:
[email protected] (F.V. Lim), jean
[email protected] (J. Phua).
https://doi.org/10.1016/j.compcom.2019.102518 8755-4615/© 2019 Elsevier Inc. All rights reserved.
2
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
this development, there have also been concerns and criticisms of the perceived lack of reliability and the inadequacy of a machine in grading students’ writing. For example, the National Council of Teachers of English (NCTE) put up a strong position statement against the use of machine scoring in education in 2013. It highlights that “computers are unable to recognize or judge those elements that we most associate with good writing (logic, clarity, accuracy, ideas relevant to a specific topic, innovative style, effective appeals to audience, different forms of organization, types of persuasion, quality of evidence, humor or irony, and effective uses of repetition, to name just a few)” (NCTE, 2013). It is therefore, important to be realistic with what a LiFT can do to improve students’ writing. This paper acknowledges the position that a machine, presently, may be unable to assess the quality of content and rhetorical flair of students’ writing effectively. Even as we recognise that the use of computers to assess students’ writing remains controversial, we argue that it is useful not to throw out the proverbial baby with the bathwater. We posit that a machine can be effective in identifying and providing feedback on language accuracy in students’ writing, that is the mechanical aspects of grammar, spelling, and punctuation. While not the only factor in good writing, a proficient and accurate use of the English language in students’ writing is fundamental to good writing. This study is premised on a machine being able to efficiently identify and provide feedback on the students’ language accuracy in their writing, and not about a machine assessing the students’ writing in the aspects of content quality and rhetorical effectiveness. This study is also not about a machine scoring of writing, but of the use of a LiFT to provide feedback on language errors. The use of a LiFT, such as the familiar Grammar and Spell Checker in Microsoft Word, is prevalent in higher education and in the workplace both in Singapore and all around the world. However, designing for the intentional use of a LiFT as part of writing instruction in the secondary school context in Singapore is novel. Linguistic Feedback Tool (LiFT) We introduce the term ‘Linguistic Feedback Tool’ (LiFT), in this paper to distinguish it from the more commonly used term ‘Automated Marking Tool’. The objective in LiFT is not about a mechanised scoring of students’ writing in the aspects of substance and style, but on identifying and providing feedback on language accuracy in students’ writing, that is, the appropriate use of grammar, spelling, and punctuation. Automated marking tool, automated essay scoring, or automated writing evaluation, are computer technology that evaluates written prose (Dikli, 2006, Shermis & Burstein, 2003; Phillips, 2007) based on analytical algorithms. LiFTs are a type of automated marking tool but focuses only identification and feedback on students’ writing rather than the provision of a score on the students’ content quality and rhetorical effectiveness. The LiFT used in this study is the free version of the Ginger Software. Ginger utilises an advanced text analysis engine based on statistical algorithms in conjunction with its patented Natural Language Processing technology to contextually understand text and intention. According to its website, Ginger differentiates itself by being able to recognize words in the context of complete sentences and claims that its algorithm can correct written sentences with relatively high accuracy (eliminating up to 95 percent of writing errors), compared to standard spell checkers. Ginger has a grammar checker version that is freely accessible for teachers and students, as a well as a premium version that includes additional features such as sentence rephraser, translater, dictionary, text reader and a trainer which provide students with personalised practice sessions based on their own errors analysed by the system. For this study, only the grammar checker in Ginger is trialled as it is freely accessible to all teachers and students and can be readily scaled up if found effective. The software can be accessed at https://www.gingersoftware.com. There is, at present, a dearth of studies on the use of the Ginger software to improve students’ writing in the secondary school English classroom context. However, there are studies on another LiFT, similar to Ginger – the Grammarly software. Genaro V. Japos (2013) observed a significant reduction of grammar errors in a study to examine the effectiveness of the use of Grammarly on 47 undergraduate theses. Likewise, Michelle Cavaleri and Salb Dianati (2016) reported that students, when asked about their experience with Grammarly, found Grammarly useful and easy to use (80%), and agreed that it helped them understand grammar rules (75%). John Milton (2006) and Kathie L. Frost (2008) argued that the immediate feedback feature of such tools is useful as it provides timely corrections to the students and encourages them to take greater ownership over their learning. Such tools have also been reported to be more reliable and consistent than human markers in identifying language errors in grammar, spelling, and punctuation. For example, Dougal Hutchison (2007), Jinhao Wang and Michelle Stallone Brown (2007), and Mark D. Shermis and Ben Hamner (2012) have reported that, in terms of reliability and consistency of marking, LiFTs tended to score either equivalently to, or if not better, than human markers.
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
3
Criticisms on the use of automated marking tools to improve students’ writing tend to focus on its inadequacy in assessing the composition’s substance and style, rather than its ability in identifying and providing feedback on the accuracy of language use. For instance, Les Perelman (2012) observed that the measurements used in automated essay scoring does not represent real-world constructs of writing ability. When used independently without a teacher’s guidance, the tool may reinforce a mechanistic and formulaic writing which is disconnected from communication in real-world contexts (Wang, Shang, & Briody, 2013). Colleen Vojak, Sonia Kline, Bill Cope, Sarah McCarthey, and Mary Kalantzis (2011) also argued that students’ writing proficiency cannot be improved without interaction with a teacher. Notwithanding, Semire Dikli (2006) and Lauren Griffiths and Barbara Nicolls (2010) observed that automated marking tools have been used to bring about a greater assurance of reliability in the writing assessment when used together with the teacher. Likewise, Andrew Klobucar et al. (2012) recommended that automated marking tools be used together with the teacher’s guidance as a form of “an early warning system” for the teacher and students. The recognition that a LiFT should be used together with a teacher is consistent with the arguments made by Mark D. Shermis and Jill C. Burstein (2003), Ware (2005) and Warschauer & Ware (2006) that automated marking tools are supplement to writing instruction rather than a replacement of writing teachers. Ruth O’Neill and Alex M. Russell (2019) report on a study investigating students’ perceptions of Grammarly when used in conjunction with advice from an academic learning advisor. Their findings revealed that students receiving feedback from Grammarly and the instructor were significantly more satisfied with the grammar advice that they received compared to students receiving feedback from the instructor only. The researchers suggested that such LiFT can complement teachers’ feedback to students and mitigate issues such as the lack of time to address grammatical errors in students’ composition. In this light, the position adopted in this paper is consistent with the other studies that the LiFT should be used to complement the teacher’s marking rather than to be used independently without the composition being marked by a teacher subsequently. We posit that the use of a LiFT can help to identify and provide feedback on the language accuracy in students’ compositions, thus allowing the teacher to focus on providing feedback on the other aspects of students’ writing such as the quality of content and rhetorical effectiveness, and in the process, also, can reduce marking time. Process Writing with Feedback from LiFT Process writing involves a recursive cycle of writing, receiving feedback, and revising. The use of process writing as an instructional approach has been reported to bring about positive development in students’ writing skills (Myles, 2002; Pritchard & Honeycutt, 2006). In particular, the iterative nature of process writing encourages students to revise their drafts and improve their writing. The feedback provided in process writing motivates students to make revisions and move from declarative knowledge of grammar rules into procedural knowledge (Negro & Chanquoy, 2005). The timely feedback also supports students’ growth in taking ownership over their learning, and is line with assessment for learning principles (Black & Wiliam, 1998). The provision of feedback on students’ writing is a critical feature in process writing. Within the process writing approach, students usually receive feedback from their teacher in order to improve their writing quality. However, it can be time-consuming and effort-intensive for teachers to provide detailed feedback for the many students in their classes across the aspects of content quality, rhetorical effectiveness, and accurate language use. We posit that the use of a LiFT can complement the teacher by providing timely feedback on the students’ use of language in the drafting and revision stage of process writing. Students will then review and correct their errors before submitting the improved version to their teacher. The teacher can then focus on the other aspects of the students’ writing, because the mechanical issues of grammar, spelling, and punctuation have been addressed by the LiFT. The timely feedback also encourages and empowers students to edit and improve their writing on their own, instead of having to wait for their teachers’ feedback, which may only come days after submission. Internationally, researchers have studied the use of a LiFT to identify and provide feedback on language accuracy in students’ writing. In a study on the use of a LiFT for process writing with three parallel college classes in Taiwan, Chi Fen Emily Chen and Wei Yuan Eugene Cheng (2008) reported that the LiFT is most useful in the early drafting and revision of the writing. Chen and Cheng (2008) also noted that it benefited students with low verbal ability more than students with high verbal ability. Douglas Grimes and Mark Warschauer (2010) conducted a 3-year longitudinal study on the LiFTs in eight schools in California and concluded that LiFTs motivated students to write and revise
4
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
Table 1 Distribution of Students by Grade Level. Level
Number (N)
N (%)
Grade Seven (13 years old) Grade Eight (14 years old) Grade Nine (15 years old) Total
137 232 67 436
31.4 53.2 15.4 100.0
more through learner autonomy. Pei-ling Wang (2013)’s study on the use of a LiFT by 53 Taiwanese college student affirmed the productivity of using a LiFT in process writing. More recently, Jinlan Tang and Changhua Sun Rich (2017) examined the use of a LiFT as part of a study on automated writing evaluation in China. They observed that students were more motivated to revise and rewrite their draft and that there was a reduction in simple grammatical errors. They reported that “teachers might not need to spend as much time correcting and commenting on the language mistakes, and the writing instruction seemed to witness a shift of focus from language form to content and discourse, from product to process” (Tang & Rich, 2017: 131). In Singapore, the English Language is the medium of instruction for all subjects. English is taught as the first language to students and as part of the bilingual policy in Singapore, all students also learn a Mother Tongue, based on their racial group, for example, Mandarin for Chinese, Malay for Malays and Tamil for Indians. The majority race in Singapore is the Chinese, followed by the Malay, Indian, and others. Most schools reflect this racial composition (Ministry of Education, Singapore, 2019). While more students are speaking the English Language at home with their family, the variety of English spoken in most home is the Singapore Colloquial English (or ‘Singlish’) (Department of Statistics Singapore, 2016). As such, teachers, through the curriculum and professional support from the Ministry of Education, emphasis the importance of learning Standard English so that students are able to use the English Language fluently in formal contexts. The English Language is taught with a focus on communicative competence and language accuracy (Pang, Lim, Choe, Peters, & Chua, 2015). Our study in Singapore is motivated by the need to provide students with timely feedback on language accuracy in their writing, while freeing up time for the teachers to focus their feedback on the other aspects of good writing. Our hypothesis is that the use of a LiFT to provide feedback on students’ language accuracy in their writing will be well-received by both teachers and students, and will also reduce marking time for teachers. The mixed methods quasi-experimental study draws on surveys, interviews and pre- and post-tests. It aims to investigate the following questions. 1) What is the teacher’ and students’ reception towards the use of a LiFT in students’ composition? 2) Does the use of a LIFT reduce the amount of time teachers spend on marking compositions? Study Part One Part One of the study focuses on the attitude and reception that teachers and students have towards the use of a LiFT by examining the teachers and students’ reflection on their user experience with the use of a LiFT to improve their English composition. Methodology Participants The study was carried out in seven schools, involving 14 classes and a total of 436 students of Grade Seven, Eight and Nine (13 to 15 years old) students (Table 1). Design At each school, students spent approximately an hour on the task. Students were first on-boarded with a familiarisation exercise with Ginger. The students then typed their compositions using Microsoft Word (with the spellcheck feature turned off). The students then revised their compositions based on the feedback from Ginger. The revised compositions were subsequently marked by their teacher.
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
5
Table 2 Survey Responses to “Does Ginger help you improve your writing?”. Response
Number (N)
N (%)
Very Useful Useful Somewhat Useful Hardly Useful Not Useful Total
164 174 84 11 3 436
37.6% 39.9% 19.3% 2.6% 0.7%
Table 3 Coding into Themes of Qualitative Responses from Survey Question: How does Ginger helps you to improve your writing. Themes
Count (% of Sample)
Correct my mistakes Improve my English Improve my grammar No comment Improve my spelling Not really helpful Improve my tenses Improve my sentence structure Improve my vocabulary Total
187 (42.9) 164 (37.6) 33 (7.6) 22 (5.0) 8 (1.8) 8 (1.8) 7 (1.6) 6 (1.4) 1 (0.2) 436 (100.0)
Research Method A mixed-method approach was adopted which included surveys with students and interviews with both students and teachers. The post-intervention student survey centred on students’ beliefs about the LiFT with three questions. 1) Does Ginger help you to improve your writing? (Likert scale). 2) How does Ginger help you to improve your writing? (Open-Ended). 3) What does Ginger need to improve on? (Open-Ended). For the open-ended survey questions, open coding was used to create tentative labels for each student response. After the initial open coding process, focused coding was conducted in search for the most frequent codes, and through constant comparison, themes were derived. Group interviews with students were undertaken to elicit their experience with Ginger and how they used it to revise and improve their writing. Each group interview consisted of six students (three from the higher ability group and three from the lower ability group) from each class and lasted thirty minutes. The students were identified by their teachers who had placed them in ability groups within each class based on the students’ performance in the previous English Language tests. The 10 teachers involved also reflected on their experiences and provided their views on the usefulness of Ginger. Teacher interviews were conducted in groups and were semi-structured. The teachers’ responded to three key questions. 1) What is the practice of process writing in your class? 2) Does Ginger help students improve using the process writing approach? 3) Does Ginger help you in your marking of students’ scripts? The interview data was examined through content analysis. Common themes were extracted, discussed and exemplified to study the teachers’ reception towards the use of a LiFT. Findings Students’ Feedback From the survey of 436 students, 77.5% of students agreed that a LiFT was either “very useful” or “useful” in helping them improve their composition (Table 2). In the open-ended survey questions, the students’ responses were coded into themes. Each student response was tagged to one theme. The codes were reviewed concurrently by two researchers and a consensus obtained on differing codes. Tables 3 and 4 list the emerging themes from the qualitative findings of the students’ survey. For example “Ginger helps to define the word whenever I misspell them and it could help me improve in my language.” was categorised under
6
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
Table 4 Coding into Themes of Qualitative Responses from Survey Question: What does Ginger need to improve on. Themes
Count (% of Sample)
Improved accuracy No need for further improvement Not applicable Better response in speed Not American English Technical features and User Interface Provide explanation for errors Total
195 (44.7) 125 (28.7) 50 (11.5) 23 (5.3) 17 (3.9) 14 (3.1) 12 (2.8) 436 (100.0)
“Improve my English”. Example of quotes categorised under “Correct my mistakes” included “It helps me to correct my mistake that I made.”, “Ginger shows us our mistakes and shows us how to do our corrections.” and “Ginger helps me to correct my grammar mistakes.” Examples of quotes categorised under “Improve my English” include “Ginger helps me to improve my English” and “It improves my English.” In response to the survey question on “How does Ginger help you to improve your writing?”, the majority of students (42.9%) indicated that Ginger helped them correct their errors while 37.6% felt that it would help them improve their English. Some responses were specific, indicating that Ginger helped them improve their spelling (1.8%), tenses (1.6%) and sentence structure (1.8%). In response to the survey question on “What does Ginger need to improve on?”, 44.7% of students would like Ginger to be more accurate. Example of quotes in this theme included “Ginger must improve on spotting Grammar mistakes because some of my mistakes were not corrected”, “The words that are incorrect.” “The words in a run-on sentence.”, and “Sometimes it will change the word in the text to another word”. The feedback from students at the focus group discussion also corroborated with the survey responses. Most of the students indicated that a LiFT was useful in improving the quality of their compositions. For example, a student mentioned that, “it lists out the wrong things and I can learn the vocab”. Another student explained that, “it was very helpful, very resourceful. It will correct your tenses and spelling.” Another student also highlighted that the LiFT helped in improving her “punctuation and tenses”. She added that the LiFT “let us know the mistake that it spot so that we can change. Students also expressed a preference for a LiFT that could categorise the type of grammatical errors made and could provide a definition of the error. For example, a student mentioned that “it can be improved by making the system more accurate”. Another student also hoped that the LiFT was able to “list out the reason [for the mistake]”. Teachers’ Feedback The emerging themes from the teachers’ interviews are that the LiFT helped students in identifying their language errors, supported process writing, and had a user-friendly interface. It was also noted that the LiFT was not always accurate. Identified Language Errors. The majority of the teachers (10 out of 12) observed that a LiFT would help weaker students identify their language errors in writing and would encourage them to take greater ownership over their learning. For example, a teacher commented that, “I definitely believe Ginger supports process writing as students are able to process their writing in bite-sized portions and allow the students, especially those who are weaker, to focus on their mistakes without being overwhelmed by their mistakes, say, when it all came back in a “bloodied” essay [full of annotations from the teachers’ red pen]”. Supported Process Writing. Another teacher added that the use of a LiFT “definitely supports the process writing approach in terms of the editing stage as students are able to use Ginger in their work and edit it according to the feedback given”.
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
7
Table 5 Examples of Inaccuracies by LiFT. Problem Areas
Example
Description
Non-errors Lapses
‘durians’ identified as a misspelled word. Failure to pick out punctuation error (inappropriate use of capital O) in “..a few Onlookers looked at us.” Identifying the word ‘we’ as a frequently repeated word throughout the passage.
Unable to identify local words Not all major errors such as tenses or word choice are identified Feedback given may not match errors identified, resulting in inappropriate suggestions for correction made.
Misdiagnosis
User-friendly Interface. The software is also “systematic in a sense that it processes the script line by line, complete with highlighting, such that students can easily spot their mistakes and correct them.” Not Always Accurate. However, two teachers did not find a LiFT useful because they noted that Ginger was not always accurate in detecting errors. A teacher observed that, “there are some mistakes that are not picked up by the Ginger software”. The other teacher also opined that “it is able to identify spelling errors, tenses and verb forms perfectly well. However, it is inconsistent for punctuation errors and misused words.” Table 5 shows examples of inaccuracies in the identification and feedback from Ginger. Motivated Students to Revise and Redraft. Regardless, some teachers felt that a LiFT could motivate students as it provided quick feedback on their work. The teachers commented that with the use of a LiFT, lower ability students would have fewer errors in their revised compositions submitted to the teacher. Higher ability students would also be able to sharpen their awareness of the different types of errors they made and improve their editing skills. For instance, a teacher commented that the LiFT “is a good tool for low ability students to be engaged and be motivated to write as they experience more success in their writing process by making fewer errors. As for the high ability students, they should be more aware of the types of errors they make to better improve their editing skills.” Saves Marking Time. Many teachers reported that they had saved marking time as the LiFT would have provided feedback for their students on grammar, spelling, and punctuation errors. The teachers indicated that with the time saved, they could focus on students’ expressions and grammatical structures that the LiFT did not capture. In addition, teachers found that they could focus on providing feedback to students on content and style. For example, a teacher mentioned that, “it can reduce teachers’ workload by cleaning up the students’ work before submission to the teacher for formal grading”. Another teacher added that the use of a LiFT “will definitely help in relieving our workload as this software can help students to pick out and correct common errors before the teachers starts to mark the essays. The scripts will thus be ‘cleaner’ and teachers can focus on correcting their expression errors and provide more feedback on how to improve on more challenging areas like vocabulary”. The findings from Part One of the study suggests that teachers and students are generally receptive to the use of a LiFT and have found it useful to improve their writing, particularly in motivating students through the provision of timely feedback on their work. It was also recognised that a LiFT is not always accurate in error detection and hence the teachers’ review of the final draft of the students’ composition will still be required. Some teachers and students perceived that the benefits were more for lower ability students than for higher ability students. Many teachers also felt that the use of a LiFT could bring about greater efficiency through reducing the marking time they would need as most of the grammar, spelling, and punctuation errors would have been addressed by the LiFT. Study Part Two The hypothesis on the saving of making time with the students’ use of a LiFT is the focus of investigation in part two of the study. We investigate this claim through the conduct of a randomised control trial. Part Two answers the question “Does the use of a LIFT reduce the amount of time teachers spend on marking compositions?”
8
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
Table 6 Random Distribution of Classes with School to Design Conditions.
School Bravo School Charlie School Echo
High Ability (HA)
Medium Ability (MA)
Low Ability (LA)
Control Ginger MSWord
MSWord Control Ginger
Ginger MSWord Control
Table 7 Mode of Composition One and Two for the three Design Conditions.
Control MSWord Ginger
Time One
Time Two
Handwritten Composition Handwritten Composition Handwritten Composition
Handwritten Composition Drafted and typed in MSWord Drafted and typed in MSWord with Ginger
Methodology Participants Three out of the seven secondary schools from Part One of the study volunteered to participate in Part Two. Each school involved three classes of Grade seven students (13 years old). The three classes from each school were assigned to three ability levels (High, Medium and Low) based on the students’ English ability as indicated by their English grade in a national examination for all Grade six students in Singapore. Design The three classes within each school were randomly assigned to either the Control, Ginger or MSWord groups. See Table 6. The Control Group would handwrite their composition, the MSWord Group would type their composition using Microsoft Word with the Spell & Grammar Checker turned on, and the Ginger Group would draft and type their composition using Microsoft Word with Ginger (and with the Microsoft Word’s Spell and Grammar Checker turned off). Microsoft Word is bundled with a Spell and Grammar Checker feature and it is interesting to study if Ginger has an added advantage over Microsoft Word. All students were required to complete two compositions within an interval of about four to five weeks. The two compositions were of the same genre. Composition one was used to establish the baseline measure of students’ writing abilities and all students handwrote Composition one. After five weeks, students were required to complete Composition two, either in handwriting, in Microsoft Word with the Spell & Grammar Checker turned on, or in Microsoft Word with Ginger, based on the group in which the class had been assigned. See Table 7. The topics of the compositions were standardised. Students were given 40 minutes to handwrite their composition at Time One and also 40 minutes to complete their composition in their allocated mode at Time Two. Both topics at the two intervals were narrative compositions. The topic of Composition one was “Dear Diary: Reflection of My Orientation Experience” and the topic of Composition two was “Dear Diary: Reflection of my Secondary School Experience”. All handwritten compositions were typed out to ensure parity in the conditions across the three groups. To ensure consistency of the essay length, all the typed compositions were standardised at 188 words as that was the minimum word length of the compositions submitted by the students. The final sample size used for analysis is presented in Table 8. Research Method For Study Two, a quantitative approach is adopted to study the difference in time taken to mark compositions written in three modes: Handwritten, typed with Microsoft Word and using Spell & Grammar Checker, and typed with Microsoft Word and using Ginger.
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
9
Table 8 Distribution of Sample Size. Time One (Topic One)
HA MA LA Total
Time Two (Topic Two)
Control
MSWord
Ginger
Total
Control
MSWord
Ginger
Total
40 33 24 97
24 36 37 97
33 25 38 96
97 94 99 290
37 22 17 76
28 26 28 82
36 19 19 74
101 67 64 232
Table 9 Descriptive Statistics of Marking Time in Seconds. Time 1
HA MA LA Total
Time 2
Control Mean (SD) Number (N)
MSWord Mean (SD) Number (N)
Ginger Mean (SD) Number (N)
Control Mean (SD) Number (N)
MSWord Mean (SD) Number (N)
Ginger Mean (SD) Number (N)
124.60 (57.74) N = 40 114.52 (50.68) N = 33 142.38 (63.36) N = 24 125.57 (57.32) N = 97
139.29 (57.96) N = 24 130.25 (57.35) N = 36 116.70 (54.96) N = 37 127.32 (56.74) N = 97
124.12 (59.85) N = 33 120.12 (46.81) N = 25 116.61 (52.92) N = 38 120.10 (53.49) N = 96
119.97 (43.97) N = 37 114.91 (48.57) N = 22 145.18 (71.63) N = 17 124.14 (53.08) N = 76
115.11 (49.22) N = 28 124.12 (47.11) N = 26 121.93 (52.73) N = 28 120.29 (49.33) N = 82
112.03 (50.69) N = 36 109.95 (61.79) N = 19 121.42 (63.44) N = 19 113.91 (56.45) N = 74
The composition scripts were compiled separately into two piles: Time One and Time Two. For Time One, the scripts were randomly assigned to eight markers. For Time Two, the scripts were randomly assigned to seven markers. To ensure consistency across the marking conditions, the markers were gathered at the same venue for the marking sessions. The markers also had to use a timer to note down the start time and end time for the marking of each script. The time taken for marking was then computed by and recorded in seconds. For each session, a benchmarking session was also carried out with six prepared scripts to standardise the marking. If there was no significant difference in the time required to mark the compositions across the three groups (Control, MSWord, Ginger), an inference that there was no difference in time taken to mark the compositions due to differences in abilities of the students by classes would be made, and the Analysis of Variance (ANOVA) method would then be employed to analyse the time required to mark the compositions across the design conditions at Time Two. However, if there was a difference in the marking time amongst the classes at Time One, it could be inferred that the abilities of the students by classes differed and hence the Multivariate Analysis of Variance (MANOVA) method would be employed to analyse the difference in marking time across the experimental conditions at Time two. Findings The descriptive statistics of the marking time is presented in Table 9. Time One A two-way ANOVA analysis was performed on the time taken to mark the compositions with the dependent variable being the time taken to mark the composition and the two independent variables were the three design conditions and the markers. The results are shown in Table 10. The independent variable of the markers was included to take into account the stylistic differences in marking by different markers, despite the benchmarking exercise that was carried out. There is no interaction effect F (13, 267) = 1.23, p > 0.05. The main effect of the intervention yielded an F ratio of F (2, 267) = 1.84, p = 0.161, indicating that there is no difference in the amount of time that teachers spend marking the compositions. Therefore, the three conditions could be considered to be identical. The main effect of markers on
10
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
Table 10 Results of Two-Way ANOVA for Intervention (A) and Markers (B) for Composition One. Source of variation
Df
Sum of Squares
Mean Square
F
p-value
Main effect (A) Main effect (B) Interaction (AB) Error Total
2 7 13 267 290
5321.52 417822.47 23174.87 386643.97 5382848.00
2660.76 59688.92 1782.68 1448.105
1.84 41.22 1.23
0.161 0.000 0.256
Table 11 Results of Two-Way ANOVA for Intervention (A) and Markers (B) for Composition Two. Source of variation
Df
Sum of Squares
Mean Square
F
p-value
Main effect (A) Main effect (B) Interaction (AB) Error Total
2 6 12 211 232
9549.67 341358.05 23104.93 278910.33 3958986.00
4774.84 56893.00 1925.41 1321.85
3.61 43.04 1.46
0.03 0.00 0.14
Control 124.14 seconds
Non-Ginger 122.22 seconds
MS Word 120.29 seconds
Ginger 113.91 seconds
Difference = 8.3 seconds Statistically difference F(1, 218) = 6.93, p < 0.01 7% improvement
Fig. 1. Summary of Key Findings.
the marking time was significant, F (7, 267) = 41.22, p = 0.000. While different markers marked differently, this does not affect the interpretation of the results. Based on the results, it is satisfactorily noted that in the absence of the use of the computer for typing and drafting the compositions, the three conditions are similar. Time 2 A two-way ANOVA analysis was performed on the time taken to mark the compositions with the dependent variable being the time taken to mark the composition and the two independent variables were the three design conditions and the markers. See Table 11. There is no interaction effect F (12, 211) = 1.46, p > 0.05. The main effect of the intervention yielded an F ratio of F (2, 211) = 3.61, p < 0.05, indicating that there is a significant difference in the amount of time that teachers spent marking the compositions amongst the three conditions. The main effect of markers on the marking time was significant, F (6, 211) = 43.04, p = 0.000, which is similar to the results obtained at Time One. Post-hoc analysis was carried out to find out if Ginger does indeed reduce marking time. Planned contrasts were carried out with the MSWord and Control groups grouped together and contrasted with Ginger, there is a statistical significance in the result F (1, 218) = 6.93, p < 0.01, with a difference in marking time of 8.3 seconds. Fig. 1 summarises the key findings.
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
11
The study indicates that the use of a LiFT would help save marking time for teachers. In particular, compositions that were revised based on feedback with Ginger brought about a statistically significant time-saving of 7% off the total marking time of teachers. Discussion The harnessing of advances in educational technology for literacy instruction in Singapore is gaining interest and momentum (for example, Lim, O’Halloran, Tan, & Marissa, 2015; Towndrow & Pereira, 2018; Aryadoust, 2019). This study represents an ongoing effort to leverage technology to improve students’ writing. The findings from Part One of the study indicate that teachers and students are receptive and saw value in the use of a LiFT to identify and provide feedback on the language accuracy in students writing. Part Two of the study reports that the students’ use of a LiFT, such as Ginger, can reduce teachers’ marking time. The findings are consistent to international findings on the use of a LiFT within the process writing approach to achieve productivity gains (for example, Chen & Cheng, 2008; Tang & Rich, 2017). While the study has demonstrated value in the use of a LiFT in writing instruction for secondary school students in Singapore, there are other considerations that determine the effectiveness of its use. Grimes and Warschauer (2010) described the paradox in the use of LiFTs. They noted that teachers’ positive views of LiFTs did not seem to contribute to more frequent use of LiFTs in the classroom. The challenges impeding a more pervasive use of a LiFT in Singapore may contribute insights to Grimes and Warchauer’s (2010) observation of a low use of LiFT despite its perceived usefulness. A main impediment for the use of a LiFT in the Singapore secondary school classroom is that, presently, students are expected to write their compositions by hand, rather than to have it typed with a word processor. When process writing is used in class, students tend to be drafting their compositions in writing. As the examinations are also in the written mode, teachers generally preferred to maintain the default practice of having student write, rather than type, their compositions. This is likely to be the situation until the use of computing devices for teaching and learning in the secondary school context becomes more prevalent in time. Related to this is the need for a corresponding shift in the literacy curriculum to recognise the ‘complementary competencies’ of writing and typing skills as part of the digital literacy students require as they prepare themselves for higher education and the workplace (Kress, 2003; Lim & Hung, 2016). Another factor for the effective use of the LiFT in the classroom is the teacher’s readiness. Teachers must be able to use the LiFT appropriately and spend time in helping students to understand and apply the feedback meaningfully from the LiFT. The feedback requires a working knowledge of grammar and students would need to have learnt this well in order to make full use of the feedback. Related to this is the student’s readiness and motivation. Students need to be self-directed and mature to review the feedback provided by the LiFT, and revise their draft with the feedback. Finally, the quality, that is, both the reliability and capability, of the LiFT in use matters. This is often directly related to the cost involved. For example, the Ginger Software, used in this study, has a free version of grammar checker that is easily available to all interested teachers. However, the free version is limited in functions and the extent of feedback that it provides for the students. The paid version offers richer functionalities as well as a basic analytics report, such as on the common types of errors made by students in a class for the teacher. Such useful features come with a price and schools would have to consider the value of such a paid tool in terms of the frequency of use by the students. As such, while the use of LiFTs to improve students’ writing has clear benefits, factors such as the current practice of writing rather than typing out the compositions, the teachers and students’ readiness, as well as the quality of LiFTs, can impede the extent of adoption of this practice. Notwithstanding these challenges, some Singapore schools have piloted and adopted the use of a LiFT to identify and provide feedback on the language accuracy in students writing. Teachers adopted the process writing approach and had their students use a computer to search for information, plan their composition, and eventually draft, edit, and revise their composition on a word processer with a LiFT installed. These teachers, coming from across schools in Singapore have also formed a community of practice on the use of LiFTs to share their lesson ideas and resources with each other (Shaari, Lim, Hung, & Kwan, 2017; Lim, Kwan, & Poh, 2019). With the efforts of these trailblazers and when the use of computing devices becomes ubiquitous in the Singapore classroom, it is expected that the paradox of perceived value but low use of LiFT in writing instruction will, in time, evanesce.
12
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
Acknowledgements We would like to thank the schools, teachers and students who participated in the study, including our past and present colleagues: Tay Siu Hua, Rachelle Lee Jun Jiao, Tan Xiao Ting, Oei Hun Ling, Grace Dong En Ping, June Lee Wei and Ng Boon Sin. We also consulted with Catherine Yeung, associate professor of marketing at the NUS Business School at the National University of Singapore in 2013 and 2014. Fei Victor Lim is Assistant Professor at the English Language and Literature Academic Group, National Institute of Education, Nanyang Technological University, Singapore. Victor is interested in the use of digital technology to improve learning. His work has appeared in Cambridge Journal of Education, Journal of Adolescent and Adult Literacy, Semiotica, Social Semiotics, and Visual Communication. Jean Phua is Lead Specialist/Technologies for Learning, Educational Technology Division at the Singapore Ministry of Education. Jean is passionate about the use of technology for teaching, learning and assessment. Her own quest in lifelong learning has led her to extensively explore, facilitate, and implement innovative technological practices in schools with school teachers.
References Aryadoust, Vahid. (2019). Dynamics of item reading and answer changing in two hearings in a computerized while-listening performance test: An eye-tracking study. Computer Assisted Language Learning,. Online first Black, Paul, & Wiliam, Dylan. (1998). Assessment and Classroom Learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7. Cavaleri, Michelle, & Dianati, Salb. (2016). ‘You want me to check your grammar again?’ The usefulness of an online grammar checker as perceived by students. Journal of Academic Language & Learning, 10(1), 22–2365. Chen, Chi Fen Emily, & Cheng, Wei Yuan Eugene. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Teaching, 12(2), 94–112. Department of Statistics Singapore. (2016). General Household Survey 2015. Republic of Singapore: Department of Statistics, Ministry of Trade and Industry. Retrieved from https://www.singstat.gov.sg/publications/ghs/ghs2015 on 7th February 2019 Dikli, Semire. (2006). An overview of automated scoring of essays. Journal of Technology, Learning, and Assessment, 5(1). Frost, Kathie L. (2008). The effects of automated essay scoring as a high school classroom intervention. Unpublished Doctoral dissertation. Las Vegas, USA: University of Nevada. Griffiths, Lauren, & Nicolls, Barbara. (2010). E-Support4U: An evaluation of academic writing skills support in practice. Nurse Education in Practice, 10(6), 341–348. Grimes, Douglas, & Warschauer, Mark. (2010). Utility in a fallible tool: A multi-site case study of automated writing evaluation. The Journal of Technology, Learning and Assessment, 8(6), 1–44. Hutchison, Dougal. (2007). An evaluation of computerised essay marking for national curriculum assessment in the UK for 11-year-olds. British Journal of Educational Technology, 38(6), 977–989. Japos, Genaro V. (2013). Effectiveness of coaching interventions using Grammarly software and plagiarism detection software in reducing grammatical errors and plagiarism of undergraduate researches. JPAIR Institutional Research, 1(1), 97–109. Klobucar, Andrew, Deane, Paul, Elliot, Norbert, Ramineni, Chaitanya, Deess, Perry, & Rudniy, Alex. (2012). Automated essay scoring and the search for valid writing assessment. In Charles Bazerman, Chris Dean, Jessica Early, Karen Lunsford, Suzie Null, Paul Rogers, & Amanda Stansell (Eds.), International advances in writing research: Cultures, places, measures (pp. 103–119). Parlor Press. Kress, Gunther. (2003). Literacy in the new media age. London & New York: Routledge. Lim, Fei Victor, & Hung, David. (2016). Teachers as learning designers: What technology has to do with learning. Educational Technology, 56(4), 26–29. Lim, Fei Victor, O’Halloran, Kay L., Tan, Sabine, & Marissa, Kwan Lin E. (2015). Teaching visual texts with multimodal analysis software. Educational Technology Research and Development, 63(6), 915–935. Lim, Fei Victor, Kwan, Yew Meng, & Poh, Meng Leng. (2019). Spreading educational technology innovations – Cultivating communities. In David Hung, Shu Shing Lee, Yancy Toh, & Azilawati Jamaludin (Eds.), Innovations in educational change: Cultivating ecologies for schools (pp. 65–83). Singapore: Springer. McCurry, Doug. (2010). Can machine scoring deal with broad and open writing tests as well as human readers? Assessing Writing, 15(2), 118–129. Milton, John. (2006). Resource-rich web-based feedback: Helping learners become independent writers. In Ken Hyland, & Fiona Hyland (Eds.), Feedback in second language writing: Context and issues (pp. 123–139). New York: Cambridge University Press. Ministry of Education, Singapore. (2019). Education Statistics Digest 2018 Feb 06. Retrieved from. https://www.moe.gov.sg/about/publications/education-statistics Myles, Johanne. (2002). Second language writing and research: The writing process and error analysis in student texts. TESL-EJ, 6(2). Retrieved from. http://tesl-ej.org/ej22/a1.html NCTE. (2013). NCTE position statement on machine scoring [Web log post] Apr 20. Retrieved Oct 10, 2018, from. http://www2.ncte.org/statement/machine scoring/ Negro, Isabelle, & Chanquoy, Lucile. (2005). Explicit and implicit training of subject-verb agreement processing in 3rd and 5th Grades. Educational Studies in Language and Literature, 5(2), 193–214. O’Neill, Ruth, & Russell, Alex M. (2019). Stop! Grammar time: University students’ perceptions of the automated feedback program Grammarly. Australasian Journal of Educational Technology, 35(1).
F.V. Lim, J. Phua / Computers and Composition 54 (2019) 102518
13
Pang, Siokhuay Elizabeth, Lim, Fei Victor, Choe, Kee Cheng, Peters, Charles Matthew, & Chua, Lai Choon. (2015). System scaling in singapore—the STELLAR story. In Chee Kit Looi, & Laik Woon Teh (Eds.), Scaling educational innovations (pp. 105–122). Singapore: Springer. Perelman, Les. (2012). Construct validity, length, score, and time in holistically graded writing assessments: The case against automated essay scoring (AES). In Charles Bazerman, Chris Dean, Jessica Early, Karen Lunsford, Suzie Null, Paul Rogers, & Amanda Stansell (Eds.), International advances in writing research: Cultures, places, measures (pp. 121–131). Parlor Press. Phillips, Susan M. (2007). Automated essay scoring: A literature review. Society for the Advancement of Excellence in Education. Pritchard, Ruie J., & Honeycutt, Ronald L. (2006). The process approach to writing instruction. Examining its effectiveness. In Charles A. MacArthur, Steve Graham, & Jill Fitzgerald (Eds.), Handbook of Writing Research (pp. 275–290). New York: The Guilford Press. Shaari, Imran, Lim, Fei Victor, Hung, David, & Kwan, Yew Meng. (2017). Cultivating sustained professional learning within a centralised education system. School Effectiveness and School Improvement. An International Journal of Research, Policy and Practice, 29(1), 22–42. Shermis, Mark D., & Hamner, Ben. (2012). Contrasting state-of-the-art automated scoring of essays: Analysis. In Paper presented at the National Council on Measurement and Evaluation. Shermis, Mark D., & Burstein, Jill C. (Eds.). (2003). Automated essay scoring: A cross-disciplinary perspective. Routledge. Tang, Jinlan, & Rich, Changhua Sun. (2017). Automated writing evaluation in an EFL setting: Lessons from China. JALT CALL Journal, 13(2), 117–146. Towndrow, Peter A., & Pereira, Andrew. (2018). Reconsidering literacy in the 21st century: Exploring the role of digital stories in teaching english to speakers of other languages. RELC Journal, 49(2), 179–194. Vojak, Colleen, Kline, Sonia, Cope, Bill, McCarthey, Sarah, & Kalantzis, Mary. (2011). New spaces and old places: An analysis of writing assessment software. Computers and Composition, 28(2), 97–111. Wang, Pei-ling. (2013). Can automated writing evaluation programs help students improve their english writing? International Journal of Applied Linguistics and English Literature, 2(1), 6–12. Wang, Jinhao, & Brown, Michelle Stallone. (2007). Automated essay scoring versus human scoring: A comparative study. The Journal of Technology, Learning, and Assessment, 6(2), 1–29. Wang, Ying-Jian, Shang, Hui-Fang, & Briody, Paul. (2013). Exploring the impact of using automated writing evaluation in English as a foreign language university students’ writing. Computer Assisted Language Learning, 26(3), 234–257. Ware, Paige. (2005). Automated writing evaluation as a pedagogical tool for writing assessment. In Ambigapathy Pandian, Gitu Chakravarthy, Peter Well, & Sarjit Kaur (Eds.), Strategies and practices for improving learning and literacy (pp. 174–184). Selangor, Malaysia: Universiti Putra Malaysia Press. Warschauer, Mark, & Ware, Paige. (2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10(2), 157–180.