Reactivity of concurrent verbal reporting in second language writing

Reactivity of concurrent verbal reporting in second language writing

Available online at www.sciencedirect.com ScienceDirect Journal of Second Language Writing 24 (2014) 51–70 Reactivity of concurrent verbal reporting...

370KB Sizes 0 Downloads 19 Views

Available online at www.sciencedirect.com

ScienceDirect Journal of Second Language Writing 24 (2014) 51–70

Reactivity of concurrent verbal reporting in second language writing Chengsong Yang a,b, Guangwei Hu c, Lawrence Jun Zhang b,* a

School of International Studies, Xi’an Jiaotong University, China b Faculty of Education, University of Auckland, New Zealand c National Institute of Education, Nanyang Technological University, Singapore

Abstract This paper reports an empirical study designed to explore whether concurrent verbal reporting has a reactive effect on the process of second language writing. Ninety-five Chinese EFL learners were randomly assigned to an argumentative writing task under three conditions: metacognitive thinking aloud (MTA), nonmetacognitive thinking aloud (NMTA), and no thinking aloud (NTA), after they completed a similar baseline writing task. Their essays were analyzed in terms of linguistic fluency, complexity, accuracy, and overall quality to examine if there were any significant between-group differences that could be taken as evidence of reactivity. After controlling for baseline differences, analyses revealed no traces of reactivity left on a majority of measures except that: (a) the two think-aloud conditions significantly increased dysfluencies in participants’ essays; (b) they also tended to reduce syntactic variety of the essays; and (c) MTA significantly prolonged time on task and retarded the speed of written production. These negative effects are interpreted in light of Kellogg’s (1996) cognitive model of writing as suggesting no serious interference with L2 writing processes and are taken as cautions for, rather than counterevidence against, the use of the think-aloud method to obtain L2 writing process data. # 2014 Elsevier Inc. All rights reserved. Keywords: Reactivity; Think-aloud; Second language acquisition (SLA); L2 writing; Argumentative writing; Chinese EFL writers

Introduction In psychology, education, and cognitive science, concurrent verbal protocols have been regarded as a major source of data or evidence through which the human mind can be indirectly read. In first language (L1) and second language (L2) research, verbal reports have been collected, coded, and analyzed to unveil what underlies visible learner performance, behaviors, and habits to develop both theoretical and pedagogical insights. Specifically, in both L1 and L2 writing research, concurrent verbal reports have been gathered to construct writing models (e.g., Flower & Hayes, 1980), investigate the subprocesses of writing (e.g., Larios, Marı´n, & Murphy, 2001; Mancho´n, Roca de Larios, & Murphy, 2009; Zellermayer & Cohen, 1996), distinguish skilled from unskilled writers (e.g., Bereiter & Scardamalia, 1987), compare the cognitive demands of different texts (e.g., Durst, 1987), explore the relations between cognitive

* Corresponding author at: School of Curriculum & Pedagogy, Faculty of Education, University of Auckland Private Bag 92601, Symonds Street, Auckland 1150, New Zealand. Tel.: +64 9 6238899x48750; mobile: +64 221633268. E-mail addresses: [email protected] (C. Yang), [email protected] (G. Hu), [email protected], [email protected] (L.J. Zhang). http://dx.doi.org/10.1016/j.jslw.2014.03.002 1060-3743/# 2014 Elsevier Inc. All rights reserved.

52

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

activities and text quality (e.g., Breetvelt, Bergh, & Rijlaarsdam, 1994), study attention processes and the role of noticing (e.g., Armengol & Cots, 2009), compare L1 and L2 writing strategies (e.g., Chenoweth & Hayes, 2001), and investigate the role/use of L1 in L2 writing (e.g., Wang & Wen, 2002). Its widespread use notwithstanding, the validity of thinking aloud (TA) as a data-elicitation method has been clouded by controversy over its potential reactivity, that is, whether the act of simultaneous reporting might serve as an additional task altering the very thinking processes it is supposed to represent and keep intact (Payne, Braunstein, & Carroll, 1978; Russo, Johnson, & Stephens, 1989; Smagorinsky, 1989; Wilson, 1994). The issue of reactivity is perhaps even more intriguing in language research, which involves ‘‘ill-defined tasks’’ like reading and writing, when ‘‘subjects must specify partly or completely their own goals,’’ and ‘‘may generate many equally satisfactory ‘solutions’’’ (Stratman & Hamp-Lyons, 1994, p. 92). The presence of reactivity, if confirmed, would have profound ramifications for previous L2 and L1 studies whose findings were based on TA data. For example, writing models based on reactive verbal reporting would have questionable fidelity; writing strategies elicited with TA could be nonexistent or artifactual; expert-novice differences identified in verbal reports, as Hayes, Flower, Schriver, Stratman, and Carey (1987) speculated, might be confounded with individuals’ different levels of flexibility in handling the constraint of concurrent verbalization; and conclusions concerning correspondence between cognitive processes and text quality would also be discounted if TA acted as a confounding variable. In contrast to the large body of empirical studies conducted in cognitive psychology, empirical research on reactivity in second language acquisition (SLA) is scanty. In her comprehensive search for SLA studies to be included in a meta-analysis, Bowles (2010) was able to identify only a total of 9 research reports on reactivity studies prior to February 2009 using L2 verbal tasks (Bowles, 2008; Bowles & Leow, 2005; Leow & Morgan-Short, 2004; Polio & Wang, 2005; Rossomondo, 2007; Sachs & Polio, 2007; Sachs & Suh, 2007; Sanz, Lin, Lado, Bowden, & Stafford, 2009; Yoshida, 2008), in addition to just a very limited few involving L1 verbal tasks. To our knowledge, three empirical studies have updated this list (Barkaoui, 2011; Goo, 2010; Yanguas & Lado, 2012), and for our interest’s sake, another four are not to be missed that used L1 writing/revision tasks (e.g., Janssen, Waes & Bergh, 1996; Levy & Ransdell, 1995; Ransdell, 1995; Stratman & Hamp-Lyons, 1994). Notably, the majority of the reactivity studies conducted in SLA used L2 reading tasks (Bowles & Leow, 2005; Goo, 2010; Leow & Morgan-Short, 2004; Polio & Wang, 2005; Rossomondo, 2007; Yoshida, 2008). Only one study (i.e., Sachs & Polio, 2007) touched upon L2 writing, focusing on use of teacher feedback in L2 revision, and one (i.e., Yanguas & Lado, 2012) investigated writing in the heritage language of Spanish. Given the preponderant use of TA as a data collection method in L2 writing research and a tangible deficiency in research on reactivity of TA herein, there is a clear need for more studies examining how TA might affect various aspects of L2 writing. The present study was designed to address this gap. Since TA reactivity in L2 writing is a little traversed area, the following sections will situate our research questions first in the broad context of TA reactivity research and then with respect to reactivity studies of TA in L1 and L2 writing. Previous research Verbal reporting, reactivity, and its potential reactivity on writing Verbal reporting involves bringing thoughts into consciousness, (re)coding the thoughts verbally before verbalizing them (Ericsson & Simon, 1993). Ericsson and Simon distinguished three levels of verbalization. Level 1 verbalization requires no verbal recoding or other intermediate processes but ‘‘simply the vocalization of covert articulatory or oral encodings’’ (p. 79). Level 2 verbalization involves verbal recoding of thoughts and mere explication or labeling of ‘‘information that is held in a compressed internal format or in an encoding that is not isomorphic with language’’ (p. 79). Level 3 verbalization requires not only recoding of thoughts but also an explanation of thoughts or thought processes. Ericsson and Simon predicted that nonmetacognitive verbal protocols (i.e., Level 1 and Level 2 verbalization) would reflect the nature of cognitive processes fairly accurately, though increasing time on task slightly, and that metacognitive verbal protocols (i.e., Level 3 verbalization) would change the structure of thoughts or the sequence of heeded information and increase time on task. They synthesized extensive empirical evidence in psychological and cognitive research that was consistent with their predictions and explained the presence and absence of reactivity in terms of an information processing model of cognitive processes which posits that ‘‘only information in focal attention can be verbalized’’ (p. 90; emphasis original). Based on this hypothesis, nonmetacognitive verbalization causes no reactivity in that it does not inhibit sequences of information otherwise duly heeded and

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

53

processed under a silent condition (i.e., suppress development of on-going thoughts), neither does it in any fashion bring new information to attention and processing (i.e., nourish new thoughts). Metacognitive verbalization, in contrast, is unequivocally reactive for the simple reason that the additional act of justification demands activation of new information, which might in turn provoke new thoughts as well, not to mention its possible threat to ‘‘normal’’ information processing given its working memory (WM) consumption. Further thought on Ericsson and Simon’s taxonomical accounts of reactivity based on the involvement of recoding and justification has sparkled concerns that ‘‘the causes of reactivity are not general but due jointly to the demands of the task and to verbalization’’ (Russo et al., 1989, pp. 762–763). Doubt has been cast on whether their predictions of (non)reactivity could readily apply to verbal tasks, given the multiple routes involved in these tasks and their open ends (e.g., Smagorinsky, 1989; Stratman & Hamp-Lyons, 1994). Following Ericsson and Simon’s hypothesis, two issues concerning verbal tasks might undermine their predictions: difficulty in verbalization and accordingly, independence of task completion to verbalization, both pointing to impairment of heeded information. Stratman and Hamp-Lyons (1994), for example, questioned Flower and Hayes’s (1981, 1984) assumption that subjects’ short-term memory contents in a text-revision or text-analysis task are orally compatible and easily verbalizable. They shared Ericsson and Simon’s (1984) concern that there are complex mental operations for which descriptive terms are elusive, and that subjects may present descriptions of such operations, only to contaminate nonmetacognitive protocols with metacognitive elements. In SLA research, before Leow and Morgan-Short’s (2004) cornerstone study on reactivity employing a reading task, there had been a few anecdotal precautions. With specific reference to use of NMTA in L2 writing, Jourdenais (2001) warned that ‘‘the think aloud data collection method itself acts as an additional task which must be considered carefully when examining learner performance’’ (p. 373). As regards task independence, nonverbal tasks might not interact with concurrent verbalization as much as verbal tasks do due to lack of task proximity, but verbal tasks, even of the same type, may also vary in their robustness, contingent upon their demands. There is evidence that the effects of TA may be more acutely felt in fulfilling complex verbal tasks. Most relevantly, in L1 writing research, Janssen et al. (1996) reported that TA caused greater disturbance in writing business letters than simpler explanatory texts. Kellogg’s (1996) cognitive model of writing further suggests that TA might have reactive effects on writing if it were too cognitively demanding and/or if contention between different writing processes for WM resources were already keen. Kellogg’s model differentiates three superordinate systems of text production each comprising two basic level processes, namely, formulation, subdivided as planning and translating; execution, composed of programming and executing; and monitoring, incorporating reading and editing. The model further delineates what components of WM that these processes tap based on Baddeley’s (1986) tripartite distinction of WM into the two slave systems of the phonological loop and the visuospatial sketchpad, and a central executive system. While the two slave systems store and process auditory and verbal information and visual and spatial information respectively, the central executive is a multipurpose, limited-capacity system which assists when the slave systems are overwhelmed, is called on in controlled processing involved in tasks demanding sustained effort, and regulates competing behaviors. Kellogg suggests that planning involves the visuospatial sketchpad, both translating and reading require the phonological loop, but virtually all writing processes place demands on the central executive system, except for executing. Kellogg’s model assumes contention for the limited WM resources among writing processes (also see Kellogg, 2001). With regard to how verbal reporting might act on writing processes, taking it as ‘‘the articulation of relevant information during composition,’’ Kellogg (1996) foresaw mutual and therefore possibly rival claims for the phonological loop by both the act of the articulation and translating, and cautiously predicted that ‘‘verbal protocols should at a minimum load the phonological loop and disrupt the quality and fluency of translation’’ (p. 69). His prediction left open the possibility that TA could require the central executive as well, which, as has been discussed, would depend on the demands and difficulty of the concurrent verbalization. That requirement being the case, the effects of TA would possibly spread to all writing processes that operate on the central executive. Ellis and Yuan (2004) applied Kellogg’s model to L2 writing research, and rightly argued that all L2 writing processes draw on the central executive, including execution, since L2 learners may not have an adult native-like automaticity writing in the target language (see also Ong & Zhang, 2013, for more recent application of a working memory model in L2 writing). Given more demands placed for WM processing in L2 writing due to L2 learners’ low proficiency, greater reactive effects may be expected of TA due to the already fierce competition for WM resources among L2 writing processes. Apart from the above caveats associated with its load on WM, concerns have not been dispersed that TA might interact with verbal tasks by triggering generation of new information; that is, it might promote learning or boost

54

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

performance by increasing verbalizers’ awareness of and reflection on the processes under verbalization. Stratman and Hamp-Lyons (1994), for example, assumed that TA might work as ‘‘auditory feedback’’ that could facilitate surface error correction in a complex revision task (p. 98). Jourdenais (2001), citing Swain’s (1985) Output Hypothesis, suggested that TA could act as an additional input helping L2 learners to notice gaps in their interlanguage. The beneficial roles of TA were even presupposed when it was used to aid L1 teaching of writing (e.g., Scardamalia, 1984) and reading (e.g., Wilhelm, 2001). Empirically, Sanz et al. (2009) reported a facilitative effect of NMTA when 24 college students completed a computerized lesson learning the Latin case system independently. Studies on reactivity in L2 reading often planted target forms in reading materials to examine if verbalization could promote or impair system learning and/or item learning, but only one study reported learning facilitated by NMTA (Rossomondo, 2007). When learning opportunities were more explicitly given in a computerized L2 problem-solving task, Bowles (2008) found no role for NMTA, but impairment of item learning by MTA. In bilingual writing, Yanguas and Lado (2012) recently recorded facilitative reactivity on writing accuracy. They supported their finding by speculating that verbalizing (nonmetacognitively) afforded more opportunities for learners to be aware of linguistic forms under production and to ‘‘monitor their own writing processes and acquire helpful strategies’’ (p. 393). Were such effects factual, verbalizing metacognitively could be even more beneficial, where chances are greater for monitoring to happen since learners need to take a look back at and then justify what they have written. In light of our review above, the issue of reactivity of TA in L2 writing may be examined from an information processing perspective resulting primarily from a couple of counterbalancing factors that might affect heeded information (i.e., the cognitive load imposed by verbalization, and increased critical attention). Carefully designed and controlled research on reactivity of both NMTA and MTA in L2 writing is evidently warranted to examine the respective magnitude of both factors and their tradeoffs, in interaction with the factor of time on task, to confirm or disconfirm Ericsson and Simon’s synthesis of (non)reactivity drawn from studies overwhelmingly employing L1 nonverbal tasks and testify or falsify Kellogg’s prediction of the reactive effects of TA on L1 writing processes based on his cognitive model of writing. Extant empirical studies in writing, both L1 and L2 ones included, are pitifully just an insufficient few. It is to these studies we turn next.

Reactivity studies in L1 writing Several empirical studies have explored the effects of TA on L1 writing processes or L1 writers’ performance (e.g., Janssen et al., 1996; Levy & Ransdell, 1995; Ransdell, 1995; Stratman & Hamp-Lyons, 1994). Stratman and HampLyons (1994), in an exploratory study, reported mixed results when they examined the effects of thinking aloud on various aspects of L1 revision under two broad categories of error detection/removal and content changes. They found that TA slightly boosted detection of faulty pronoun references, but seemed to slightly inhibit detection of information organization errors, and did not affect detection of phrase-level redundancies and word-level errors. They also noted that TA greatly inhibited meaning changes by word or phrase additions, deletions, and substitutions, and had very little bearing on complex meaning changes and macrostructural changes. Given the small sample (N = 12) of participants involved in the study, no statistical analysis conducted, and the salience given to revision in an experimenter-designed revision task, their findings may not easily apply to on-line revision as a normal process integrated into the recursive process of writing. Three studies involved students writing in L1. Ransdell (1995) reported an effect on speed measures when he operationalized TA reactivity in terms of observed differences in rate, quantity, and syntactic complexity of composition. In his experiment, 38 students were asked to write a letter on a computer within 12 minutes to a close friend about their first days of school in each of the following three conditions: silent, TA, and retrospective report based on watching a replay of the original composition. Among the dependent variables of words composed per minute, total number of words, mean clause length, total number of clauses, and clauses composed per minute, significant differences were found only for words per minute and clauses per minute, with the TA condition yielding lower rates. Levy and Ransdell’s (1995) study further confirmed that the effects of TA could be trivial when they measured the reactive impact of TA on L1 writing processes by comparing 10 undergraduates’ interference response times (IRTs) in a silent writing session and the following TA sessions. They found that for planning and text generation, the IRTs were rather close, indicating that no extra effort was exerted on these two subprocesses as a result of TA. They also reported

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

55

that during the time allocated to text generation, the participants produced as many words while thinking aloud as they did silently. These findings led the researchers to conclude that the reactive effects of thinking aloud were negligible. In contrast, Janssen et al. (1996) concluded that TA disturbed normal writing processes when they examined pause duration in combination with pause location in the text and in the composing process. Two experiments were included in their study, when participants all wrote on a computer word processor, with data captured by a resident software program. In Experiment 1, 20 students were asked to complete one of two business reports under TA conditions, and then were assigned to writing with their task and condition alternated. Experiment 2 adopted a similar design, in which 28 students completed two simple explanatory texts. Both experiments measured pauses between sentences and between paragraphs, but Experiment 2 looked more finely at pauses within and between word groups instead of pauses within sentences examined in Experiment 1, and Experiment 2 also defined shorter minimum pause length. The results from both experiments showed that in the TA condition, pause time increased significantly on almost all levels. By relating pauses in different locations to different levels of planning or processes of monitoring and planning to revise, the researchers further concluded that TA altered writing activities and was reactive. The three reactivity studies using L1 writing tasks do not appear to yield consistent results, subject to miscellaneous research design, task type, and especially operationalization of reactivity. However, taken together, the results of Ransdell’s and Janssen et al.’s studies seem to suggest that TA was likely to slow down L1 written production temporarily, which was measured respectively by speed of production and pause length in these two studies. Given the small number of studies and some of the limitations these studies were subject to (e.g., notably, the small sample size in Ransdell and his colleague’s studies, the yet-to-be-clarified relevance of pause features to interference in Janssen et al.’s study), more research is clearly needed to infer the reactivity of TA based on performance indexes, following a tradition in reactivity research both inside and outside SLA. Ransdell’s study was a start, but more dimensions of writing performance (e.g., complexity, accuracy) should have been included. Reactivity studies in L2/bilingual writing Two empirical studies of this category are identified that either involved the issue of reactivity on L2 learners’ use of teacher reformulations in revision (Sachs & Polio, 2007), or explored the reactive effects of TA in bilingual writing (Yanguas & Lado, 2012). Sachs and Polio reported two experiments: Experiment 1, a repeated-measures design and Experiment 2, a non-repeated measures design. They found negative reactivity in Experiment 1, since analysis conducted with a Wilcoxon signed rank test indicated that 15 English-as-L2 participants revised significantly more errors after they had compared their writings with teacher reformulations silently than they did when they had thought aloud during their comparison. However, they noted no reactivity in Experiment 2, since comparison made between the reformulation and reformulation + TA conditions with a Mann–Whitney test indicated that the 16 ESL learners randomly assigned to the TA condition revised no fewer errors than the 11 learners in the silent condition. Perhaps the most analogous to research on reactivity in L2 writing hitherto, Yanguas and Lado (2012) recently reported a quasi-experimental study involving bilinguals of English and Spanish to complete a writing task in their heritage language Spanish.1 The participants were in two classes attending a Spanish course. One class formed a + TA group (N = 20) and the other a  TA group (N = 17). They completed the semi-guided task within 25 minutes, with the intervention group thinking aloud while they wrote. All the writing samples were then analyzed in terms of fluency, accuracy, and lexical complexity. Fluency was measured by number of words and number of words per T-unit, accuracy by error-free T-units, and lexical complexity by lexical variety, which was gauged by Uber Index, a transformation of the type-token ratio following the formula: Uber Index = (log tokens)2/(log tokens  log types) (Jarvis, 2002). A series of one-way ANOVA was run to compare the scores of these four dependent variables across groups. Results indicated that the TA group outperformed the silent group in accuracy ( p = .005) and lexical complexity ( p = .052), with effect sizes being medium (d > 0.50), but that both groups performed equally well in terms of fluency, pointing to positive reactivity overall. The notable benefits reported in this study are incompatible with the findings of previous studies in L1 writing (Janssen et al., 1996; Levy & Ransdell, 1995; Ransdell, 1995) which reported detriments to fluency or nonreactivity, and with Ericsson and Simon’s (1984, 1993) conclusion for NMTA. Though bilinguals writing in their non-dominant language may have been affected differently, the findings of this 1

Spanish was considered their HL since they spoke Spanish at home but did not have formal command of it.

56

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

study should better be viewed with some caution due to its quasi-experimental design, unavailability of pretest scores, and sample size. Moreover, the researchers’ justification of the finding based on verbal accounts seemingly facilitative to writing performance tended to be arbitrary since it remains a mystery whether the thinking processes underlying those accounts were idiosyncratic to the TA group but not featured by the silent group as well. Given the questionable findings reported in this study, there is even stronger appeal for more inquiries in L2 writing, which preferably involves ESL or EFL learners due to the extreme scarcity of such research. Research questions To bridge the gaps identified above in previous research on TA reactivity in SLA in general and L2 writing in particular, the present study was conducted to investigate whether MTA and NMTA have reactive effects on L2 writing. Reactivity was operationalized as significant differences across the writing conditions in L2 learners’ writing performance, measured in terms of writing fluency, linguistic complexity, formal accuracy, and overall writing quality. From these performance indexes, we expected to infer the extent of alterations caused by TA, if ever there were any, in L2 writing processes. Specifically, the following research questions were formulated for the study: 1. Does concurrent verbal reporting, be it metacognitive or nonmetacognitive, have any effects on the writing fluency, linguistic complexity, formal accuracy, and overall writing quality of texts produced by L2 learners? 2. Do MTA and NMTA differ in their effects, if there are any, on the aforementioned measures of texts written by L2 learners?

Method Participants All 95 participants were first-semester non-English-major sophomores recruited from around 20 English classes at a state-key university situated in Northwest China. They had uniformly followed an English foundation curriculum during their first tertiary year before they attended two English elective courses such as American Culture and English Newspaper Reading at the time of study. Seventeen to twenty years of age (M = 19.96, SD = 0.86), these undergraduates had learnt English for an approximate average of 9 years (M = 8.94, SD = 1.99). None of them had visited an English-speaking country, though, except for one who had enjoyed a one-month tour at Singapore. Their National Matriculation English Test (NMET) scores ranged from 105 to 141 (M = 126.75, SD = 8.20), and their College English Test (CET) Band 4 scores fell between 387 and 628 (M = 540.43, SD = 45.88).2 All of them, together with the dropouts, were compensated on a task basis. The original pool comprised 107 sufficiently briefed registrants who showed up at the baseline writing task. However, four participants were excluded after this first round of writing and were left to the piloting of the TA tasks and experimenter training when all participants’ personal information forms and essays were collected and examined: one was preparing for IELTS, another one spoke a minority language unintelligible to the experimenters, still another one wrote only around half the required number of words, and the fourth one did not hand in his essay upon completion. Of the 103 participants left in the pool, with seven randomly retained as backups, the remaining 96 participants were randomly assigned to one of three conditions: NTA, NMTA and MTA, of 32 each, to complete the main task. During main data collection, three participants withdrew from the think-aloud tasks, and their vacancies were then filled by three randomly drawn backups. The remaining four backups each undertook an MTA task as possible substitutes in case that some MTA protocols, once examined, might lack sufficient justifications not to be eligible. Eventually, three of them were randomly drawn and included in the final analysis in lieu of two samples whose recordings were inaudible, and one who was unable to report sufficiently metacognitively. One participant in the silent group was removed too because she erased reformulated words, making it impossible to count dysfluencies. The final number of participants was thus 95 (NTA = 31, NMTA = 32, MTA = 32; male = 64, female = 31). 2

Some participants reported their CET-4 scores later after the experiment when their scores were available from a forthcoming CET-4 test.

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

57

Instruments The following writing task from Cambridge IELTS 3 (2002, p. 126) was selected for baseline data collection. ‘‘It is generally accepted that families are not as close as they used to be. Give some reasons why this change has happened. Suggest how families could be brought closer together.’’ Another similar task from Cambridge IELTS 4 (2005, p. 101) was selected for main data collection: ‘‘In many countries schools have severe problems with student behavior. What do you think are the causes of this? What solutions can you suggest?’’ The word limit for both essays was set as at least 400 words, and there were no time constraints. Both tasks were supposed to be fair to every participant in terms of topic familiarity, given that none of these participants had prepared and were preparing for IELTS, and that the topics covered herein concerned everyday issues. However, the tasks may be demanding to these sophomores since they had been exposed to much shorter outline-prompted College English Test (CET) essays. It was assumed that if no negative reactivity was found on a demanding L2 writing task, the same finding would apply to less demanding ones as well. Procedure All participants completed the baseline writing task silently in a classroom in three successive batches. They started uniformly at a time indicated on a clock and were required to write down the exact time of their completion on their papers and submit their essays immediately afterwards. The whole process was closely monitored. The piloting and experimenter training were conducted in an office. The first author and two research assistants (RAs) discussed and practiced with both NMTA and MTA before they administered the piloting alternatively. The RAs were experienced English-major postgraduate students and were paid. Two different sets of instructions had been developed for both TA conditions before piloting and were refined during and after piloting and experimenter training. The major difference between the two sets of think-aloud instructions was that no justification was required of participants thinking aloud nonmetacognitively. The instructions were given in Chinese at the time of data collection (see Appendixes 1 and 2 for both their Chinese and English versions).3 In main task completion, all 32 participants assigned to the NTA condition finished the second task silently in a classroom, just as they did the baseline task. To avoid mutual interference from verbalization, and to ensure group representativeness, the think-aloud tasks were conducted in three separate offices individually, where 64 participants in the TA groups met with the three experimenters one by one, who would sit at the right rear of the participants throughout task completion. Appointments were made so that one participant reporting nonmetacognitively would be followed by one reporting metacognitively. Prior to writing the main task, these participants were first guided through a training procedure which included five steps: 1) read the think-aloud instructions; 2) listen to two think-aloud audio clips4; 3) Q & A; 4) trial write and think aloud a short passage of about 30 words on two topics (‘‘My Favorite Food’’ and ‘‘An Unforgettable Person’’) alternated evenly among both think-aloud groups; and 5) Q & A. During task completion, in both TA conditions, if participants paused for more than 9 seconds without verbalization, they would be reminded to continue to think aloud with a brief request of ‘‘qing shuohua’’ in Chinese (‘‘please speak’’). In the NMTA condition, the experimenters would see to it that all participants did not deviate to extra justification. In the MTA condition, the experimenters would prompt participants with a simple and quick reminder of ‘‘wei shenme?’’ in Chinese (‘‘Why?’’) when they noticed participants fail to provide even a single instance of reason per sentence.5 Throughout task completion, all participants wore microphones and their verbal reports were recorded on a digital recorder. 3

It should be noted that in our instructions for the MTA condition, participants were given the options to justify before, while, and after writing. While while-writing justification was virtually impossible, only in very rare cases did participants justify toward what they were going to write. 4 One of the clips used for NMTA training concerned solving an equation problem and the other related to writing an English sentence after some planning. One of the clips used for MTA showed a chess player making a move while reporting the reasons for his decision and the other recorded a writer justifying a sentence he was writing with specific reasons left blank (where he paused). 5 Bowles (2010) suggested an important ‘‘50 percent’’ demarcation in coding MTA versus NMTA protocols. That is, the former should contain justifications of ‘‘50 per cent or more of the paths chosen’’ to be distinguished from the latter (p. 123), a benchmark that derived from her practice in her (2008) study that utilized a problem-solving task. However, due to difficulty in applying the notion of ‘‘paths’’ to a L2 writing task, for easy operationalization, the three experimenters were required to ensure, with discretion, that there was at least one instance of reason reported per sentence for MTA protocols.

58

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

Table 1 Data collection procedure. Baseline data (1 day)

Piloting of think-aloud tasks (3 days)

Main data collection (about 7 days)

Informed consent form Confidentiality form Personal information form The first writing task

Experimenter training & practice Trial of TA tasks with 4 participants Discussions Experimenter training & practice

Participants completed the second writing task in one of the three experiment conditions.

Based on experimenter feedback, prior to quantitative analysis, the first author conducted a selective listening of the recordings from the NMTA group. He then listened carefully and repeatedly to the recordings for the MTA group while looking at their essays to further confirm the eligibility of their reports for the condition.6 All the 190 essays kept for final analysis were typed into Microsoft Word 2010, and the resultant files were checked meticulously in reference to the manuscripts. Table 1 summarizes the entire data collection procedure. Data coding and analysis Dependent variables comprised fluency, complexity, and accuracy measures and a measure for overall quality of writing. The first three categories incorporated those measures used by Ellis and Yuan (2004), except that a different measure of lexical variety, D (Malvern & Richards, 2002), was adopted in place of Mean Segmental Type-Token Ratio (MSTTR). Additionally, two more metrics of complexity, general complexity and subclausal complexity via phrasal elaboration, were employed to gauge multiple dimensions of complexity (Norris & Ortega, 2009). The fluency measures consisted of time on task, syllables per minute, and dysfluencies (i.e., the total number of words that a participant crosses out or reformulates divided by the total number of words produced). In syllable counting, for wrong forms (e.g., misspellings, coinages, misinflected words), syllables were counted in reference to applicable pronunciation rules (e.g., *socity was counted as having three syllables and *putted as having two); for any initialism, one syllable was counted (e.g., TV, MSN). In word counting, hyphenated and slash-divided words were counted as two, but an affix wrongly hyphenated with a word root (e.g., *mis-understanding) was still counted as one. Numbers and Chinese characters were not counted in both syllable and word counting. General complexity was calculated as words per T-unit, syntactic complexity as the number of finite and nonfinite clauses per T-unit, and subclausal complexity as words per finite clause. Nonfinite clauses were counted in reference to Foster, Tonkyn and Wigglesworth’s (2000) definition of subordinate clauses. For a non-finite verb plus one other nonfinite verb as its clause element (e.g., Students should be encouraged to learn to be independent), two non-finite clauses were counted. Syntactic variety counted the number of different verb forms pertaining to tense, modality, and voice, indicative of learners’ performance in low-level complexity (Ellis & Yuan, 2004). Different nonfinite verb forms and modality forms that have not acquired official status (e.g., be likely to) were also counted. Wrong forms (e.g., must be do, will went) were attributed to their nearest appropriate categories. D was a measure of lexical diversity recommended by Malvern and Richards (2002) (also see Richards & Malvern, 1997 for a detailed explanation and McKee, Malvern, & Richards, 2000 for a full rationale). According to their modeling, D is a third variable in the mathematical equation that relates TTR (Type/Token Ratio) to token size (N): TTR = D/N [(1 + 2 N/D)  1]. D values were calculated after all participants’ essays were transformed and embedded into the standard CHAT format (Codes for the Human Analysis of Transcripts) of the CHILDES project (Child Language Data Exchange System) (MacWhinney & Snow, 1990), with lemmas retained and misspelt words corrected, and by running the Vocd command, which was written by Gerard McKee and is freely available to other researchers as part of the CLAN (Computerized Language Analysis) programs (MacWhinney, 2000a, 2000b) from the CHILDES web site (http://childes.psy.cmu.edu). The accuracy measures included error-free clauses and correct verb forms. Errors were counted in accordance with the scheme developed by Polio (1997). Attention was paid to distinguishing verb form errors from plural form errors. For example, in ‘‘Some student spend most of their time entertaining themselves,’’ no verb form error was counted. 6

A ratio index was calculated by dividing the number of T-units in the participant’s essay by the total instances of reason roughly identified. The ratios for all MTA protocols ranged from 1.08 to 5.27 (M = 2.7852, SD = 1.18).

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

59

The overall quality of the essays was scored based on Jacobs, Zingraf, Wormuth, Hartfiel, and Hughey’s (1981) ESL Composition Profile. The scheme was analytical and measured five aspects of performance with uneven weight, namely, content (30%), organization (20%), vocabulary (20%), language use (25%), and mechanics (5%), with each component having four bands indicating different levels of mastery. Except for the measure D, an RA and the first author independently coded the data for 24 anonymous essays chosen randomly and evenly from the three groups and for both writing tasks. Pearson Product Moment correlation coefficients for the scores of the two coders concerning all items reached above .90. Discussions were held before the coding schemes were revised. The first author then coded the rest of the data except for two measures (syllables per minute and dysfluencies), which were coded by an RA. An experienced English teacher studying in Hong Kong participated in essay rating for calculating interrater reliability. She and the first author first studied and discussed Jacobs et al.’s scheme, then with consensus, practiced with 11 argumentative essays written by the first author’s former sophomore students in two rounds of scoring and discussion, and finally started to score the 24 essays of choice. An ultimate reliability rate of .932 was achieved. The first author then began to score the remaining 166 essays in anonymous printouts. Statistical analysis was conducted following the procedures below: (a) the normal distributions of all three groups’ scores for both the baseline and main tasks were first checked in terms of skewness and kurtosis; (b) a series of oneway ANOVAs were performed on all baseline scores, together with post hoc Games-Howell tests, to examine baseline group differences; (c) where baseline scores did not meet evident normal distribution (i.e., dysfluencies), the KruksalWallis Test was run, followed by independent t-tests, to check the results obtained in (b) for the particular measure; (d) the correlations between all baseline and corresponding main task scores were checked; (e) all main task scores were then entered for ANCOVA and compared pairwise through the Bonferroni procedure, using the corresponding baseline scores as the covariates, to examine main task group differences while taking into account baseline differences; (f) where main task scores did not meet normal distribution and the assumption of homogeneity of variance (i.e., dysfluencies), the ANCOVA and pairwise comparison were conducted with the outliers in the baseline and main task data removed,7 and a Kruksal-Wallis Test and independent t-tests were also run on all main task data to crosscheck the ANCOVA results. The alpha level was set at .05. For main task data pairwise comparisons, the effect sizes were calculated by dividing the adjusted mean difference by the square root of MS0error . Results Results from analyses of the baseline data The results of the ANOVAs and post hoc Games-Howell tests run on the baseline data indicated no significant difference between the three groups for all the dependent variables except for time on task (i.e., the MTA group spent significantly longer time, p = .017, but wrote more words as well, p = .148, than did the NTA group). The KruksalWallis Test followed by a series of independent t-tests run on the dysfluencies scores showed that the three groups did not differ significantly in this measure (all p > .05), therefore testifying the ANOVA results concerning the measure (F = .603, p = .549). Taken together, the statistics for the baseline task demonstrate that for all the measures of writing performance concerned, the three groups were highly comparable at the start. Perhaps less importantly, this homogeneity of groups was corroborated when their NMET scores and CET-4 scores were also compared ( p = .175 and p = .217). Results from analyses of main task data When the correlations between participants’ baseline and main task scores were checked, Pearson’s correlation coefficients were all significant (i.e., .561 for syllables per minute, .489 for dysfluencies, .485 for clauses per T-unit, .568 for general complexity, .494 for subclausal complexity, .389 for different verb forms, .478 for D, .624 for errorfree clauses, .402 for correct verb forms, .460 for time on task, and .549 for overall quality). The medium-to-high 7

Removed were two outliers in the main task dysfluencies data (0.701 and 0.537, both from the NMTA group) and three in the baseline task data (0.300, 0.237, and 0.215, from the NTA, NMTA, and MTA groups respectively), making the total number of participants entering this analysis 90.

60

Table 2 Descriptive statistics and results of ANCOVAs and pairwise comparisons for main task dependent variables. M (SD, adjusted M) for 3 think-aloud conditions Dependent variables

NTA

Fluency

59.226 (13.889, 63.856) 12.245 (3.595, 11.848) 0.069 (0.042, 0.065)

62.594 (15.330, 62.183) 11.153 (2.626, 11.233) 0.092 (0.067, 0.091)

84.938 (26.996,82.691) 8.501 (2.902, 8.606) 0.095 (0.048, 0.100)

13.223 (2.050, 13.417) 1.975 (0.280, 1.995)

14.308 (2.451, 14.112) 2.126 (0.339, 2.088)

13.379 (2.468, 13.324) 1.938 (0.340, 1.951)

Syntactic variety Lexical variety

8.834 (1.288, 8.904) 12.225 (2.704, 12.759) 66.125 (15.713, 67.520)

8.995 (1.228, 8.999) 11.562 (2.154, 11.406) 66.507 (12.578, 66.455)

9.084 (0.994, 9.030) 11.406 (2.861, 11.283) 66.774 (12.898, 66.146)

Error-free clauses Correct verb forms

0.531 (0.114, 0.546) 0.857 (0.081, 0.869)

.590 (0.102, 0.573) 0.881 (0.067, 0.878)

.581 (0.123, 0.584) .899 (.080, 0.898)

67.581 (9.528, 68.887)

68.844 (10.925, 69.157)

72.656 (9.093, 70.919)

Syllables per minute Dysfluencies General complexity Syntactic complexity

Complexity

Accuracy

Overall quality *

p < .05.

MTA

Subclausal complexity

F

Pairwise comparisons: Bonferroni p (effect sizes) h2p

p

11.488

.000

*

.205

15.850

.000*

.263

5.863

.004*

.122

1.533

.222

.033

1.820

.168

.039

.124

.884

.003

3.471

.035*

.072

.112

.894

.003

1.460

.238

.032

1.383

.256

.030

.501

.608

.011

NTA vs. NMTA

NTA vs. MTA *

NMTA vs. MTA

1.000 (0.092) .956 (0.254) .050* (0.581)

.001 (1.032) .000* (1.339) .004* (0.760)

.000* (1.124) .000* (1.084) 1.000 (0.201)

.496 (0.357) .636 (0.323)

1.000 (0.048) 1.000 (0.153)

.334 (0.405) .193 (0.476)

1.000 (0.092) .093 (0.576) 1.000 (0.089)

1.000 (0.122) .055 (0.628) 1.000 (0.115)

1.000 (0.299) 1.000 (0.052) 1.000 (0.026)

.745 (0.302) 1.000 (0.141)

.296 (0.425) .323 (0.410)

1.000 (0.123) .822 (0.269)

1.000 (0.032)

1.000 (0.239)

1.000 (0.207)

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

Time on task (min.)

NMTA

ANCOVA

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

61

levels of correlation indicated that the prerequisite for using the baseline task scores as the covariates for ANCOVA analyses was met (see Goo & Mackey, 2013). Table 2 presents the results of the ANCOVA and Bonferroni computations. The results for the measure of dysfluencies were based on outlier-free data. The follow-up KruksalWallis Test and independent t-tests yielded the similar pattern of results. As is shown in Table 2, the three groups did not differ in four of the five complexity measures (i.e., general complexity, syntactic complexity, subclausal complexity, and lexical variety), in both accuracy measures (i.e., errorfree clauses, and correct verb forms), and above all, in overall quality of writing. When they were compared pairwise, no significant difference between any two groups was found in these dependent variables either. Significant differences across the three condition groups were found in the three fluency measures (i.e., time on task, F = 11.488, p = .000; syllables per minute, F = 15.850, p = .000; dysfluencies, F = 5.863, p = .004) and in one complexity measure (i.e., syntactic variety, F = 3.471, p = .035). However, the effect sizes for task conditions, as indicated by the values of the partial eta squared, were greater as regards the fluency measures than the affected complexity measure, since the values pertaining to fluency fell within the small range (h2p ¼ :205 for time on task, h2p ¼ :263 for syllables per minute, and h2p ¼ :122 for dysfluencies) while that pertaining to complexity indicated a negligible effect (h2p ¼ :072). The pairwise comparisons showed a highly significant ( p = .001) and very large (d = 1.032) difference in time on task and an even greater difference ( p = .000, d = 1.339) in the rate of production between the MTA group and the NTA group, with the former consuming much longer time writing much less productively. Although the NMTA group completed the task slightly faster than the NTA group did, as indicated by the adjusted means of time, they were less productive than the NTA group when the total numbers of syllables produced were considered. The NTA-NMTA difference in this measure, however, was found to be insignificant ( p = .956) and very small (d = 0.254). Both the NMTA group and the MTA group made significantly more dysfluencies than the NTA group ( p = .050 and p = .004). The effect sizes for both the NTA-NMTA and NTA-MTA comparisons were around upper medium (d = 0.581 and d = 0.760). Also importantly, the two TA groups used fewer different verb forms than their peers under the silent condition. The decrease in this measure pertaining to the MTA group approached statistical significance ( p = .055), with a medium effect size (d = 0.628). Similarly, that on the part of the NMTA group recorded a medium effect size as well (d = 0.576), although it did not reach significance ( p = .093). Table 3 summarizes the effects of MTA and NMTA on L2 writing performance. As is manifest, except for time on task and rate of production, both forms of verbalization appeared to react upon or leave unaffected the same aspects of performance—both aggravated dysfluencies and tended to suppress syntactic variety, but conserved general complexity, syntactic complexity, subclausal complexity, lexical variety, accuracy, and overall quality. In all these mutual aspects, as the pairwise comparisons showed, no significant differences existed between the two TA groups. That noted, however, MTA affected two more aspects of fluency, namely, time on task and rate of production, where the NMTA-MTA differences were highly significant (both p = .000), with very large effect sizes (d = 1.124 and d = 1.084). Moreover, MTA consistently recorded greater effect sizes than NMTA on all aspects of L2 writing performance examined, except for general complexity and syntactic complexity. Discussion The research question concerning whether NMTA or MTA has any effects on L2 learners’ writing performance in terms of fluency, complexity, accuracy, and overall writing quality does not receive a sweeping answer. The impact of verbalization struck only limited dimensions of measurement: both NMTA and MTA incurred a decline in fluency and tended to threaten one aspect of complexity, but neither of them affected accuracy, most aspects of complexity, and overall writing quality, either negatively or positively. Most noticeably, the act of TA, be it NMTA or MTA, slowed down L2 writing. MTA caused a substantial decline in the number of syllables produced per minute and a marked increase in time on task, while NMTA incurred a slight decrease in productivity, though it did not prolong time on task. The significant increase in time found for MTA is consistent with Bowles’s (2008) and Bowles and Leow’s (2005) findings that MTA prolonged time on task. The slightly lower rate of L2 production caused by NMTA follows Ransdell’s (1995) similar finding of NMTA reducing rate of L1 writing, and echoes Janssen et al.’s (1996) study reporting NMTA causing a considerable increase in pause length in L1 written production. These findings, taken together, support Bowles’s (2010) meta-analysis reporting quite consistent effects of TA on time on task, and confirm Ericsson and Simon’s (1993) conclusions in this regard.

62

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

Table 3 Reactive effects of MTA and NMTA on writing performance. TA type

Fluency

Complexity 



Accuracy

Overall quality

NMTA

Increased dysfluencies

Decreased syntactic variety (tendency)

No difference

No difference

MTA

Increased time on task Decreased rate of production Increased dysfluencies

Decreased syntactic variety (tendency)

No difference

No difference

Note.



indicates a negative effect.

Also evidently, both NMTA and MTA caused significantly more dysfluencies in L2 writing. When engaged in the double acts of writing and concurrent reporting, L2 learners did not write as smoothly as when writing single-mindedly but generated more false-starts and self-corrections. Based on Kellogg’s model, it may be surmised that when the extra verbal (re)coding and articulation processes involved in TA joined the contention for WM resources, some L2 writing monitoring processes that could have happened prior to final executing were crowded out, so that the quality of some corresponding pre-executing (planning or translating) processes was not maintained, only to be compensated by more post hoc editing processes to keep up this quality (see also Rijlaarsdam & Van den Bergh, 1996 for a compensatory system account of writing processes). This finding, therefore, may offer support to Kellogg’s (1996) prediction that verbal reporting may interfere with the quality of translating. The adjustment of writing behavior engendered herein could be actualized possibly because the contention for WM resources was presumably less keen in post-executing monitoring. It may also be attributed to the peripheral status of dysfluencies as a measure of fluency, which, externalized as the extent of neatness and legibility in handwriting, is very marginally, if not rarely, included in a scoring scheme. L2 learners would make as many cross-outs and reformulations as they thought necessary and sacrifice this aspect of performance for what they thought were central concerns, for example, meaning conveyance and formal accuracy. As regards complexity, both types of verbalization showed a tendency to decrease syntactic variety, but left undisturbed other aspects of complexity (i.e., general complexity, syntactic complexity, subclausal complexity, and lexical variety). In reference to Kellogg’s model, it may be deduced that due to the cognitive demand of verbal reporting, certain translating processes might have been inhibited that functioned to verbalize planned propositions in relation to tense, modality, or voice, whereas the central translating processes involving word choice, clausal composition, and modification retained their normal shares of attentional control. The overall somewhat detrimental effect of TA on L2 translating confirms Kellogg’s (1996) prediction that verbal reporting may impair the quality of translation. However, the varying degrees of independence from TA disturbance displayed by different functions of translating processes may suggest that some functions enjoyed priority in WM resource allocation over others. The priority given to lexical retrieval has been suggested by Ellis and Yuan (2004), who stated that ‘‘given the importance of locating the relevant vocabulary to encode the propositional content . . ., all the writers prioritized lexical search during on-line assembly’’ (p. 78). By this notion Ellis and Yuan emphasized the close relevance of this aspect of performance to transmission of meaning, which enjoys top priority generally (VanPatten, 1990). It is perhaps equally reasonable to assume priority given to general, syntactic, and subclausal complexity in light of the indispensability of clausal, subclausal, and phrasal construction to conveying meaning. The comparative marginalization of verb form variety may be attributable primarily to its less substance in this regard. Loaded with TA, L2 learners were likely to stick to prototypical verb forms and give up peripheral ones (e.g., may in lieu of is likely to), which were less automatized and demanded more control (e.g., Hu, 2002). The sacrifice in syntactic variety is also related to the participants’ feeble metalinguistic knowledge concerning this aspect, partly due to its relatively little application as one facet of writing performance. Evidence might be found that they may think of justifying their syntactic use (e.g., a subordinate clause of concession), and their word choice, but very scarcely did they mention their intention to diversify verb use. Neither NMTA nor MTA improved or impaired L2 writing accuracy. This finding is consistent with the findings of nonreactivity for NMTA reported in a majority of previous SLA studies measuring accuracy in reading comprehension (Bowles, 2008; Bowles & Leow, 2005; Leow & Morgan-Short, 2004; Rossomondo, 2007) and with Ericsson and Simon’s (1993) conclusion for NMTA based on empirical studies measuring accuracy of performance. Nonetheless, it

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

63

seems inconsistent with the findings of detrimental reactivity on accuracy for MTA documented in the previous SLA literature (e.g., Bowles, 2008; Bowles & Leow, 2005) and with Ericsson and Simon’s (1993) prediction of reactivity for this type of verbalization. The robustness of the accuracy measures may indicate that with part of working memory resources diverted to verbalization, a normal share of the central executive was still preserved for attending to form in the translating and monitoring processes. Presumably, this preservation was achieved at the cost of other aspects of performance or processes being less attended to or partially inhibited (i.e., dysfluencies and verb form translating). The unimpaired formal accuracy could be attributed to the testing-like writing environment in this experiment, where participants were believed to attach great importance to accuracy naturally (Ellis, 2009; Wigglesworth, 1997), and more importantly, to an exam-oriented English learning setting in China (e.g., Zhang, 2010). Also contributing to the stability of the accuracy measures, noticeably, is the more time spent on unit product, as indicated by the speed measure of fluency, which may have relieved the pressure from TA and helped retain accuracy of real-time production. This is especially true of MTA. The finding on accuracy also reveals that TA may not be able to work significantly to benefit translating processes or activate effective editing processes. The nonreactivity for NMTA in this regard supports much SLA research on reactivity that reported no facilitative effects of NMTA on learning target forms incidentally (Bowles, 2008; Bowles & Leow, 2005; Leow & Morgan-Short, 2004; Polio & Wang, 2005), but contrasts Yanguas and Lado’s (2012) finding of NMTA significantly increasing HL writing accuracy. Arguably, more credibility should be given to our finding for NMTA if it is considered that even in the case of MTA, which coerces L2 learners into retrospectively reading what has been written8 and offering a ‘‘critique’’ of it, no improvement in accuracy was detected either. Indeed, as a response to any suspicion of facilitative effects of verbalization effectuated by increased critical attention, the role of NMTA and MTA in improving formal aspects of L2 production could be negligible. Raising structural complexity, lexical variety and accuracy could be highly controlled behavior to L2 writers so that under the dual claim for WM resources by task completion and TA, there was little capacity of the central executive left for this aim. This is especially the case for the NMTA condition when the writing task did not take more time than in the silent condition. In metacognitive verbal reports, it appeared not uncommon that syntactic structures, word choice, and other formal aspects served as points of justification, in addition to contents. However, such justification may not nurture serious consideration of form. Reasons involving statement of functions or purposes (‘‘[I use] ‘Besides’ [here], [to make] a transition’’), making comments (‘‘I feel this sounds like Chinglish, but it is vivid’’), or reference to authorities (‘‘I use this clause because my teacher told me I should write complex sentences’’) were not likely to trigger editing processes. Even when relevant metacognitive knowledge were retrieved and reported, it tended to justify, rather than doubt, what had just been written. Also, whether such knowledge would take effect was constrained by L2 learners’ linguistic repertoire (e.g., their amount of vocabulary as substitutes, and correct forms). Overall, perhaps due to the aforementioned constraints of WM resources, contents of reasoning, orientation of justification, and language resources, only very sporadic cases of editing were observed that were triggered by MTA, which were directed at easily discernible points of doubt. Neither type of TA disturbed the overall quality of L2 writing. This finding conforms to Ransdell’s (1995) report of no impact of NMTA on the overall quality of L1 writing and made NMTA somewhat comparable to the secondary memory load task used in Ransdell, Arecco, and Levy’s (2001) study, which did not change the quality of writing either. This finding of intact overall quality should be expected because under the NMTA condition major content changes including macro-structural changes were unlikely to happen even in a dedicated revision task (Stratman & Hamp-Lyons, 1994), and still much less likely to happen in on-line revision if such revision processes were ever initiated. Furthermore, the key formal determinants of writing quality, syntactic complexity, vocabulary, and accuracy, remained robust, indicating no significant detriment to the quality of translating. The impairing effects were only felt in those peripheral aspects that did not account for much when the overall quality of L2 writing was measured, namely, speed of production, dysfluencies, and perhaps syntactic variety as well. As an answer to the second research question, MTA did differ from NMTA since its impact appeared larger in scope and greater in magnitude. That is, MTA affected one more aspect of fluency (i.e., speed of production), and its effects on most aspects of performance recorded larger effect sizes. The greater reactive effects and more time on task

8

More often than not, these on-line reading processes should be peculiar to the MTA condition because they are generally ignored by L2 learner writers.

64

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

attributed to MTA have been predicted by Ericsson and Simon (1993). These findings also echo the analogous contrast afforded by Bowles and Leow (2005), who reported that MTA impinged upon L2 reading comprehension whereas NMTA did not, and by Bowles (2008), who found that MTA significantly impaired item learning and increased time on task in a L2 learning task while NMTA exerted no noticeable effects on the two aspects. The greater negativity incurred by MTA could be largely attributed to the additional requirement of justifying processes, which demand WM resources. It is also noteworthy that both MTA and NMTA impacted and left undisturbed the same aspects of performance, and that except for the measure of speed of production, none of the differences between their effects achieved significance. These similarities may imply that NMTA and MTA actually did not differ greatly in reactivity in L2 writing, as reflected in performance differences, when much pressure exerted by additional justification may have been diluted by more time on task under the MTA condition. This conclusion may hold true if the MTA condition could be arguably viewed as NMTA processes inserted periodically with justification processes, which do not seem to have interacted significantly with L2 writing processes per se, though altering L2 writing processes by their insertion apparently. In a nutshell, the overall effects of TA on L2 writing processes appeared unsubstantial when such effects were operationalized in terms of L2 writing performance. It did not seem to affect the central writing processes key to normal L2 writing while it only impaired L2 writing marginally by crowding out some compensable pre-executing monitoring processes and by inhibiting certain peripheral translating processes. The alterations of L2 writing processes in this degree should serve better as cautions for, rather than as counterevidence against, the use of TA to elicit process data. There is a possibility that the reactive effects of TA could be reduced or minimized in its future use if L2 writers were to complete the TA task with more ease. In case MTA should be used to collect real-time metacognitive data or explicit knowledge concerning L2 writing, learners might be allowed to write a sentence silently and then begin to justify, to ease their burden of verbalization.

Conclusion Several major conclusions can be drawn from this study. First, both forms of TA appear to be somewhat detrimental to L2 writing performance and reactive to L2 writing processes, but their overall effects may not be strong. Significant effects seem to touch upon rather minor and easy-to-be-unattended aspects of writing performance such as dysfluencies and potentially syntactic variety, but are unable to impinge on those core formal concerns in measuring L2 writing (i.e., general complexity, syntactic complexity, subclausal complexity, lexical variety, and accuracy), and above all, the overall quality of writing. Inferably, by verbal reporting, certain monitoring processes may be displaced and some translating processes can be inhibited, in accordance with Kellogg’s prediction as regards a disrupting effect of verbal reporting on translating, but the bulk and core of L2 writing processes can remain undisturbed. Second, MTA may cause greater effects than NMTA, but due to the similarities of and the insignificant differences between their effects, MTA may not affect L2 writing processes greatly differently from the way NMTA does, suggesting limited interaction between the additional justification processes and L2 writing processes. Third, the negative overall effects of TA on L2 writing indicate that the act of TA can form some load on WM, due to its demand for the phonological loop and possibly, the central executive as well, judging from the range of its effects on writing processes, especially its effects on processes that do demand the central executive (i.e., L2 monitoring) and that keener competition for the limited WM resources serves as a dominating factor in causing its effects, overwhelming other factors concomitant to TA that might be facilitative to L2 writing. In relation to Ericsson and Simon’s (1984, 1993) hypothesis, this would mean that TA is more likely to constrain on-going information processing in completion of an L2 writing task than to trigger generation of new thoughts. Finally, the impacts of TA are not evenly shared among different aspects of performance, and inferably, different L2 writing processes, or even the different functions of a same process, indicating an order of priority in allocation of WM resources in real-time production. This order may follow a possible principle of ‘‘sacrifice of the most easily ignored and the most cognitive demanding,’’ as has been discussed. One limitation of this study lies in the possibility that experimenters might have missed prompting MTA participants due to the intensive and prolonged work of supervision. It is also possible that fatigue of participants might have confounded the results. Another limitation concerns the crudeness of the accuracy measures. An error-count, ‘‘a more fine-grained measure of accuracy’’ (Polio, 1997, p. 117), could have also been used to capture any subtle effects of the act of verbalization.

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

65

Further research might need to conduct detailed analysis of the contents of verbal reports to triangulate our present statistical results. Given the scarcity of studies on reactivity in L2 writing, more studies in this area are clearly needed, which may employ other types of writing tasks, compare different-proficiency/WMC groups, or high and low verbalizers, restrict the reporting language to either L1 or L2, engage the same learners to L1 and L2 writing tasks, give more focused instructions, etc. Also to support some of the explanations in our discussion, there is a need for more research into WMC allocation mechanism, which relates priority taking to social-cultural contexts. Acknowledgements We would like to acknowledge the very constructive feedback from all the three reviewers and the statistician. We are particularly indebted to the editor, Professor Rosa Mancho´n, for her helpful comments, which, alongside the reviewers’ recommendations, have greatly helped to improve the clarity of our paper. Any fault that still remains is the authors’ responsibility.

Appendix A. Instructions for the MTA Task A.1. Report your thoughts while you write In this experiment, we are interested in what you think in completing the writing task. For this purpose, we ask you to report your ongoing thoughts while you write. Your contribution is of great significance to studying Chinese university students’ English writing processes. You can begin to tell aloud everything that occurs to you, from the moment when you see the writing topic. But at the same time, you need to explain why you write the way you do, including why you make a particular writing decision. You may resort to your linguistic knowledge, knowledge about writing, or any other knowledge for such reporting. Your reasons can be given before, while, or after you write something, depending on your convenience. Just write as you usually do. There is nothing special except that you need to report and justify. You are free to choose your reporting language, Chinese, English, or Chinese mixed with English, whatever you feel comfortable with. The experimenter will not communicate with you. He/She will only remind you duly when you forget to speak, or when you forget to justify. His/Her reminder may cause some interference, but please take your ease. There is no time constraint.

66

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

Appendix B. Instructions for the NMTA Task B.1. Report your thoughts while you write In this experiment, we are interested in what you think in completing the writing task. For this purpose, we ask you to report your ongoing thoughts while you write. Your contribution is of great significance to studying Chinese university students’ English writing processes. You can begin to tell aloud everything that occurs to you, from the moment when you see the writing topic. Report whatever you think until you submit your essay. Just write as you usually do. There is nothing special except that you need to report. You don’t need to explain why you write. You are free to choose your reporting language, Chinese, English, or Chinese mixed with English, whatever you feel comfortable with. The experimenter will not communicate with you. He/She will only remind you duly when you forget to speak. His/ Her reminder may cause some interference, but please take your ease. There is no time constraint.

Appendix C. A sample essay written under the NMTA condition In many countries schools have severe problems with student behavior. Many causes lead to this phenomenon. I want to introduce my view on the issue. Firstly, students are far more enough to be an adult. They are in a period of rebellion in their lives. They won’t obey lots of frustrating rules, nor can they obey. They may deal with some things only by their emotions in the certain moment. The bad result is not in their consideration. So schools have severe problems with student behavior. Secondly, Nowadays, our society is an opened society. People are much more outgoing than before, schools’ students must be included in it. Naturally, their behavior is associated with the society behind them more or less. Since many crimes, robs and other bad behavior usually appear in the television and newspaper, they behavior will be linked with the bad behavior they have seen in the TV. At last, a large number of them behave badly at school. Last but not least, their behavior is much decided by their school’s environment. As the saying goes, nothing can be reached without norms. Suppose a school without good study environment, students of it will not and they also can not work hard. Since their classmates, roommates and other peers spend little time on study, how can they keep a quiet mood to study. The teachers will not be satisfied with their bad behavior, and the schools have severe problems with student behavior. I want to suggest some solutions to the cause above. In my opinion, what we should do in the first place is to enhance students’ ability of controlling themselves. This can be called a fundamental measure. Most of students’ bad behavior is caused by an impulse. If given more

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

67

time do careful consideration, the result may be the oppisite. Schools are able to arrange some special teachers to correct students’ faults. These teachers are required to have much experiences and they are able to persuade students to do what they have instructed. Schools may also deliver some pamphlets to make more influence on students. Some students never think of controlling themselves specially, so they are in a bad control of their behavior. Once they are told the importance of this ability, they may try their best to improve this certain ability. Teachers meet with students very often. Their behavior will cause great influence on students. How can a teacher with bad behavior teach a good student? So schools could also improve teachers’ behavior. Teachers are supposed to be given some rights to punish these students. Students will have to take it seriously about the punishment when they want to behave badly. Schools can also inform parents of their children’s bad behavior at school. Most time of a student is spent at school or at home, and if both the school and parents take effective measures, problems are easier to cope with. It is a general phenomenon that schools have severe problems with student behavior. I suppose that we can change the situation if the solutions above are taken.

Appendix D. A sample essay written under the MTA condition Last week, I heard a news from father that Bob, an old friend of mine, was arrested. I was shurked. In my memory, Bob was a good boy. Bob grew with his grandparents, because his parents worked in another city. In our primary school, Bob was a clear boy and very popular. I was even envied him at that time. However, when we were in our high school, he changed. He started to be absent in our class, be addicted in internet and learn to smoke and drink. His grade was worse and worse. When we was in our Grade 2 in high school, he droped his study. Since he left school, I heared little about him. My father told me the whole story. After leaving from school, he had learned to be a cook for a year. However, he did not become cook. He stayed with his friends every day. For the lack of money, they started to rob. No soon, they were all arrested. I thought about it for long time. What causes it that a good boy become a robber? There are many reasons, I think. Family educations, influences from school, from friends, and the attitudes of themselves are all the reasons. Family educations have a big influence on students’ behavior. A good family education can make students understand the importance of study better. Furthermore, it can prevent students from those bad habits, like smoking, drinking. On the other hand, a bad family education can not give students the love they want. Therefore, those students may not support themselves under the impressure study, and give up. Family educations are important to students’ behavior. School education is another reason. Student spend many hours in their school getting education every day. Teachers is the most important role in school education. A teacher’s smile may give a student confidence. So, teachers’ attitude will effect a lot to students. To a good student, teachers’ care may be the power of progress. And to a not so good student, teachers’ careless can kill his hope. As the ones who know most about a student, friends may decide the futher of him. As friends, they can teach him how to study well, how to keep healthy, how to be popular, they can tell him to drop chasses to play computer games with them. So, friends can make students better or worse. Fanilly, their own attitudes are most significant. Attitudes decide everything. A student can keep behave well though he dose not have good environment. For those students who have problems in their school behavior, the environment is the excuse for them to make those trouble. Family, teachers, friends all effect a lot to a student’s behavior, but the attitude is the biggest one. As far I am concerned, the problems all can be solved. For family, parents should care more about their children. Parents can communicate with students, know how they think, try to understand the pressure from study and help them. For school, I think, teachers ought to pay more attention to those students who do not so well in school. On the other hand, the pressure of student should be decreased. To students themselves, they must recognize what they want most. If they want to be somebody, they need keep going. I hope every student will have a nice futher.

68

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

References Armengol, L., & Cots, J. (2009). Attention processes observed in think-aloud protocols: Two multilingual informants writing in two languages. Language Awareness, 18, 259–276. Baddeley, A. (1986). Working memory. Oxford, UK: Oxford University Press. Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 18, 51–75. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Mahwah, NJ: Erlbaum. Bowles, M. (2008). Task type and reactivity of verbal reports in SLA: A first look at a L2 task other than reading. Studies in Second Language Acquisition, 30, 359–387. Bowles, M. (2010). The think-aloud controversy in second language research. New York, NY: Routledge. Bowles, M., & Leow, R. P. (2005). Reactivity and type of verbal report in SLA research methodology: Expanding the scope of investigation. Studies in Second Language Acquisition, 27, 415–440. Breetvelt, I., Bergh, H., & Rijlaarsdam, G. (1994). Relations between writing processes and text quality: When and how? Cognition and Instruction, 12, 103–123. Cambridge IELTS 3. (2002). Examination papers from the University of Cambridge Local Examinations Syndicate. Cambridge, UK: Cambridge University Press. Cambridge IELTS 4. (2005). Examination papers from the University of Cambridge ESOL Examinations: English for Speakers of Other Languages. Cambridge, UK: Cambridge University Press. Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18, 80–89. Durst, R. K. (1987). Cognitive and linguistic demands of analytic writing. Research in the Teaching of English, 21, 347–376. Ellis, R. (2009). The differential effects of three types of task planning on the fluency, complexity, and accuracy in L2 oral production. Applied Linguistics, 30, 474–509. Ellis, R., & Yuan, F. (2004). The effects of planning on fluency, complexity, and accuracy in second language narrative writing. Studies in Second Language Acquisition, 26, 59–84. Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT Press. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (rev. ed.). Cambridge, MA: MIT Press. Flower, L., & Hayes, J. (1980). The dynamics of composing: Making plans and juggling constraints. In L. Gregg & E. Steinberg (Eds.), Cognitive processes in writing (pp. 31–50). Mahwah, NJ: Erlbaum. Flower, L., & Hayes, J. (1981). A cognitive process theory of writing. College Composition and Communication, 32, 365–387. Flower, L., & Hayes, J. (1984). Images, plans, and prose: The representation of meaning in writing. Written Communication, 1, 120–160. Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. Applied Linguistics, 21, 354–375. Goo, J. (2010). Working memory and reactivity. Language Learning, 60, 712–752. Goo, J., & Mackey, A. (2013). The case against the case against recasts. Studies in Second Language Acquisition, 35, 127–165. Hayes, J. R., Flower, L., Schriver, K., Stratman, J., & Carey, L. (1987). Cognitive processes in revision. In S. Rosenberg (Ed.), Advances in applied linguistics. Vol. 11: Reading, writing and language processing (pp. 176–240). Cambridge, UK: Cambridge University Press. Hu, G. W. (2002). Psychological constraints on the utility of metalinguistic knowledge in second language production. Studies in Second Language Acquisition, 24, 347–386. Jacobs, H., Zingraf, S., Wormuth, D., Hartfiel, V., & Hughey, J. (1981). Testing ESL composition: A practical approach. Roweley, MA: Newbury House. Janssen, D., Waes, L., & Bergh, H. (1996). Effects of thinking aloud on writing processes. In C. Levy & S. Ransdell (Eds.), The science of writing (pp. 233–250). Mahwah, NJ: Erlbaum. Jarvis, S. (2002). Short texts, best-fitting curves, and new measures of lexical diversity. Language Testing, 19, 57–84. Jourdenais, R. (2001). Cognition, instruction and protocol analysis. In P. Robinson (Ed.), Cognition and second language instruction (pp. 354–375). New York, NY: Cambridge University Press. Kellogg, R. (1996). A model of working memory in writing. In C. Levy & S. Ransdell (Eds.), The science of writing (pp. 57–71). Mahwah, NJ: Erlbaum. Kellogg, R. (2001). Competition for working memory among writing processes. The American Journal of Psychology, 114, 175–191. Leow, R. P., & Morgan-Short, K. (2004). To think aloud or not to think aloud: The issue of reactivity in SLA research methodology. Studies in Second Language Acquisition, 26, 35–57. Levy, C. M., & Ransdell, S. (1995). Is writing as difficult as it seems? Memory and Cognition, 23, 767–779. MacWhinney, B. (2000a). The CHILDES project: Tools for analyzing talk. Volume I: Transcription format and programs (3rd ed.). Mahwah, NJ: Erlbaum. MacWhinney, B. (2000b). The CHILDES project: Tools for analyzing talk. Volume II: The database (3rd ed.). Mahwah, NJ: Erlbaum. MacWhinney, B., & Snow, C. (1990). The Child Language Data Exchange System: An update. Journal of Child Language, 17, 457–472. Malvern, D., & Richards, B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19, 85–104. Mancho´n, R. M., Roca de Larios, J., & Murphy, L. (2009). The temporal dimension and problem-solving nature of foreign language composing processes: Implications for theory. In R. M. Mancho´n (Ed.), Writing in foreign language contexts: Learning, teaching, and research (pp. 102– 129). Bristol, UK: Multilingual Matters. McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary and Linguistic Computing, 15, 323–337.

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

69

Norris, J., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30, 555–578. Ong, J., & Zhang, L. J. (2013). Effects of the manipulation of cognitive processes on EFL writers’ text quality. TESOL Quarterly, 47, 375–398. Payne, J. W., Braunstein, M. L., & Carroll, J. S. (1978). Exploring predecisional behavior: An alternative approach to decision research. Organizational Behavior and Human Performance, 22, 17–44. Polio, C. (1997). Measures of linguistic accuracy in second language writing research. Language Learning, 47, 101–143. Polio, C., & Wang, J. (2005, October). Another look at the reactivity of concurrent verbal protocols in second language reading research. Paper presented at the Second Language Research Forum. Ransdell, S. (1995). Generating thinking-aloud protocols: Impact on the narrative writing of college students. American Journal of Psychology, 108, 89–98. Ransdell, S., Arecco, M. R., & Levy, C. M. (2001). Bilingual long-term working memory: The effects of working memory loads on writing quality and fluency. Applied Psycholinguistics, 22, 113–128. Richards, B., & Malvern, D. D. (1997). Quantifying lexical diversity in the study of language development. Reading, UK: Faculty of Education and Community Studies, The University of Reading. Rijlaarsdam, G., & Van den Bergh, H. (1996). An agenda for research into an interactive compensatory model of writing: Many questions, some answers. In C. Levy & S. Ransdell (Eds.), The science of writing (pp. 107–126). Mahwah, NJ: Erlbaum. Roca de Larios, J., Marı´n, J., & Murphy, L. (2001). A temporal analysis of formulation processes in L1 and L2 writing. Language Learning, 51, 497– 583. Rossomondo, A. E. (2007). The role of lexical temporal indicators and text interaction format in the incidental acquisition of the Spanish future tense. Studies in Second Language Acquisition, 29, 39–66. Russo, J. E., Johnson, E. J., & Stephens, D. L. (1989). The validity of verbal protocols. Memory & Cognition, 17, 759–769. Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29, 67–100. Sachs, R., & Suh, B. (2007). Textually enhanced recasts, learner awareness, and L2 outcomes in synchronous computer-mediated interaction. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 197–227). Oxford, UK: Oxford University Press. Sanz, C., Lin, H.-J., Lado, B., Bowden, H. W., & Stafford, C. A. (2009). Concurrent verbalizations, pedagogical conditions, and reactivity: Two CALL studies. Language Learning, 59, 33–71. Scardamalia, M. (1984). Teachability of reflective processes in written composition. Cognitive Science, 8, 173–190. Smagorinsky, P. (1989). The reliability and validity of protocol analysis. Written Communication, 6, 463–479. Stratman, J. F., & Hamp-Lyons, L. (1994). Reactivity in concurrent think-aloud protocols. In P. Smagorinsky (Ed.), Speaking about writing: Reflections on research methodology (pp. 89–112). London, UK: Sage. Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass & C. Madden (Eds.), Input in Second Language Acquisition (pp. 235–256). New York, NY: Newbury House. VanPatten, B. (1990). Attending to content and form in the input: An experiment in consciousness. Studies in Second Language Acquisition, 12, 287– 301. Wang, W. Y., & Wen, Q. F. (2002). L1 use in the L2 composing process: An exploratory study of 16 Chinese EFL writers. Journal of Second Language Writing, 11, 225–246. Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14, 85–106. Wilhelm, J. D. (2001). Improving comprehension with think-aloud strategies. New York, NY: Scholastic Professional Books. Wilson, T. D. (1994). The proper protocol: Validity and completeness of verbal reports. Psychological Science, 5, 249–252. Yanguas, I., & Lado, B. (2012). Is thinking aloud reactive when writing in the heritage language? Foreign Language Annals, 45, 380–399. Yoshida, M. (2008). Think-aloud protocols and type of reading task: The issue of reactivity in L2 reading research. In M. Bowles (Ed.), Selected proceedings of the 2007 Second Language Research Forum (pp. 199–209). Somerville, MA: Cascadilla. Zellermayer, M., & Cohen, J. (1996). Varying paths for learning to revise. Instructional Science, 24, 177–195. Zhang, L. J. (2010). A dynamic metacognitive systems account of Chinese university students’ knowledge about EFL reading. TESOL Quarterly, 44, 320–353. Chengsong Yang is a PhD candidate in Applied Linguistics at the School of Curriculum and Pedagogy, Faculty of Education, The University of Auckland, New Zealand. He holds an MA in Foreign Linguistics and Applied Linguistics from Xidian University, China and an MA in Applied Linguistics from Nanyang Technological University, Singapore. His research interests lie mainly in the validity of concurrent verbal reporting as a methodology in second language acquisition and China English. He has published articles in Chinese local journals and given presentations at international conferences. Dr. Guangwei Hu is Associate Professor in the English Language and Literature Academic Group, National Institute of Education, Nanyang Technological University, Singapore. His current research covers academic discourse, bilingualism and bilingual education, home (bi)literacy practices and acquisition, metalinguistic awareness, and second language acquisition. His research articles have appeared in many edited volumes as well as international journals such as Studies in Second Language Acquisition, British Journal of Educational Psychology, Instructional Science, Journal of Multilingual and Multicultural Development, Journal of Pragmatics, Language and Education, Language Awareness, Language Learning, Language Teaching Research, Research in the Teaching of English, Review of Educational Research, System, Teachers College Record, and TESOL Quarterly.

70

C. Yang et al. / Journal of Second Language Writing 24 (2014) 51–70

Dr. Lawrence Jun Zhang is Associate Professor and Associate Dean, Faculty of Education, University of Auckland, New Zealand. He has published widely on topics related to language learning and teaching in British Journal of Educational Psychology, Journal of Psycholinguistic Research, Instructional Science, Language Awareness, Language and Education, RELC Journal, System, TESOL Quarterly, and Journal of Second Language Writing. His interests lie in learner metacognition in biliteracy, reading and writing development, and representations of lexical and syntactic knowledge in bilingual and second language acquisition. He is the recipient of TESOL Award for Distinguished Research 2011 for his article ‘‘A dynamic metacognitive systems account of Chinese university students’ knowledge about EFL reading’’ published in TESOL Quarterly (2010). He is a current Associate-Editor of TESOL Quarterly and editorial board member of Applied Linguistics Review, Chinese Journal of Applied Linguistics, RELC Journal, System, and Metacognition and Learning.