Journal of English for Academic Purposes 14 (2014) 84–94
Contents lists available at ScienceDirect
Journal of English for Academic Purposes journal homepage: www.elsevier.com/locate/jeap
Formulaic language in L1 and L2 expert academic writing: Convergent and divergent usage Carmen Pérez-Llantada* University of Zaragoza, Faculty of Arts, Department of English and German Studies, 50009 Zaragoza Spain
a b s t r a c t Keywords: Academic writing English for academic purposes Research article genre Lexical bundles Register analysis Second language acquisition Interlanguage development
Formulaicity (i.e. knowledge of conventionalised multi-word combinations) in academic writing is not part of the native writer’s innate language ability and is thus far from being a linguistic universal skill (Kachru, 2009; Wray, 2008). It can therefore be assumed that L2 academic writers find it particularly difficult to acquire native-like formulaic sequences. Building on this assumption, I use a 5.7 million-word corpus of expert academic writing to compare convergent and divergent usage of lexical bundles in three language variables, L1 English, L2 English and L1 Spanish. I identify core bundles (i.e., bundles shared by the three variables) and contend that writers’ usage of these bundles is determined by register. I also compare the structures and functions of bundles specific to one or to two language variables to exemplify how these distinctive bundles build different pragmatic meanings in the texts. In identifying phraseological norms implicitly recognised by L1 writers, I argue that the use of bundles by the L2 writers deviates from L1 norms and conclude that, although they are expert writers, their formulaicity is ‘hybrid’, that is, largely, but not completely, native-like. I also discuss implications regarding L2 expert writers’ interlanguage development and propose areas for pedagogical intervention. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction Over the past decades, formulaicity (i.e., knowledge of conventionalised multi-word combinations) in academic writing has been investigated by several influential research strands. Taking a frequency-based approach, the North-American corpus linguistics school has taxonomised formulaic sequences (called ‘lexical bundles’) in both academic speech and writing. It has been consistently argued that each academic genre displays “a distinct set of lexical bundles, associated with [its] typical communicative purposes” (Biber & Barbieri, 2007, p. 265). Structural and functional descriptions of lexical bundles have also served to describe English academic writing in terms of grammatical compression, syntactic elaboration and degree of explicitness (Biber, 2009; Biber, Conrad, & Reppen, 1998; Biber & Gray, 2010; Biber, Johansson, Leech, Conrad, & Finegan, 1999). Formulaic language in American and British academic English varieties as well as in other varieties of academic writing such as Argentinean and Peninsular Spanish or Philippine, has also been typified within this research tradition (Biber, Conrad, & Cortes, 2004; Cortes, 2004; Liu, 2012; Pérez-Llantada, 2012; Salazar, 2010). Formulaicity has also been investigated with a view to examining non-native English speakers’ lexical bundle use and proposing pedagogical interventions for the teaching of English as a Foreign Language. Drawing on learner corpora, a number of lexico-grammatical, pragmatic and stylistic features in L2 English have been reported as deviant from L1 norms
* Fax: þ34 976 76 15 19. E-mail address:
[email protected]. 1475-1585/$ – see front matter Ó 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jeap.2014.01.002
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
85
(Granger, 1998; Granger & Meunier, 2008; Howarth, 1998; Meunier & Granger, 2008). There is also little dispute that advanced English learners use fewer formulaic sequences than their native-speaker counterparts (Granger, 1998) and that their language production exhibits “lack of register awareness, phraseological infelicities, and semantic misuse” (Gilquin, Granger, & Paquot, 2007, p. 319). In the field of English for Academic Purposes, phraseology research also maintains that L2 English learners with different proficiency levels overuse, underuse and misuse L1 English bundles and fail to understand their pragmatic functions according to L1 conventions (Ädel & Erman, 2012; Chen & Baker, 2010; Salazar, 2010; Staples, Egbert, Biber, & McClair, 2013). Formulaicity has also been approached from the perspectives of psycholinguistics and language acquisition. It has been claimed that knowledge of academic formulas facilitates fluent language processing and that mastery of bundles equates to successful language production (Ellis, Simpson-Vlach, & Maynard, 2008; Schmitt, 2004; Wray, 2002). Research has also offered evidence that lexical phrases in L1 are learnt ‘as wholes’ and not as strings of individual words, that formulaic language is learnt incrementally and that fluent language users exhibit an ample repertoire of memorised language sequences (Ellis, 2008; Li & Schmitt, 2009). Li and Schmitt (2009, p. 86) further note that the absence of formulaic sequences in language production signals the “lack of mastery of a novice writer in a specific disciplinary community”. Formulaicity in academic writing is not a language universal skill (Kachru, 2009; Wray, 2008). Both L1 and L2 academic writers may not only acquire formulaic sequences through formal instruction but also through non-formal incidental learning d e.g., extensive academic reading and repeated usage of patterns through extensive writing (Ellis, 2008; Li & Schmitt, 2009). Supporting Warren’s (2005, p. 38) claim that native-like mastery of idiomaticity is difficult to attain by foreign language learners, research with small-scale monolingual and multilingual corpora has demonstrated that formulaic language is associated with expert, and not novice, academic writing production (Cortes, 2004, 2008; Durrant & MathewsAydɪnlɪ, 2011; Neff, 2008; Römer, 2009). To my knowledge, no corpus-driven studies to date have systematically contrasted the formulaicity of L2 English published writing vis-à-vis that of L1 English published writing using large-scale corpora. In an attempt to fill this gap, here I compare lexical bundles across three language variables of academic writing (L1 English, L2 English written by Spanish scholars and L1 Spanish). The aim is to ascertain to what extent formulaic language in L2 expert writing is native-like. The following questions helped to focus the investigation: 1. Which are the high-frequency lexical bundles in each language variable? What are the defining features of these bundles? Is choice of bundles determined by register/genre? 2. Which are core bundles shared by the three language variables? Which bundles are shared only by L1–L2 English and only by L2 English–L1 Spanish? Are these bundles similar or different structurally and functionally? How do these bundles build discourse meanings? 3. Finally, which bundles are distinctive to L1 English, L2 English and L1 Spanish? Do these bundles involve distinctive structures and functions? How do these bundles build discourse meanings? In identifying the phraseological norms implicitly recognised by L1 writers, I discuss several possible reasons why the L2 English writers’ bundle usage deviates from L1 norms. It is argued here that, although the L2 English writers are published (and hence, expert) authors, their formulaic language is ‘hybrid’dlargely, but not fully, native-like. 2. Methodology The corpus used is the Spanish–English Research Article Corpus (SERAC 2.0), a 5.7-million word compilation of 1056 research articles (RAs) that comprises three sets of texts, each of them representing a ‘language’ variable. The first set of texts includes 360 L1 English RAs written by scholars from Anglophone-based contexts and published in peer-reviewed Englishmedium journals from different disciplinary fields. The second set comprises 336 L2 English original (not translated) RAs written by Spanish scholars and published in the same journals in which the L1 English texts were published. The third set includes 360 L1 Spanish RAs published by Spanish scholars in peer-reviewed Spanish journals targeted at a national-based scholarly readership. Selecting peer-reviewed journals was expected to guarantee that writers had experience in journal publications and thus familiarity with register/genre and style conventions in research writing. Table 1 shows the overall statistics. Table 1 SERAC 2.0 statistics.
Tokens (running words) in text Types (distinct words) Type/token ratio TTR Standardised TTR Standardised TTR std. dev. Sentences Mean (in words) Std. dev.
L1 English
L2 English
L1 Spanish
2,146,347 54,184 2.65 37.44 62.42 87,390 23.36 15.13
1,771,727 51,020 3.04 37.7 62.75 66,903 25.12 16.01
1,811,071 70,190 4.03 39.21 61.40 60,085 29.01 19.15
86
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
SERAC was designed for contrastive rhetoric descriptive research and thus the three subsets of texts are comparable insomuch as they share the same textual and contextual attributes: text form (scientific exposition), genre (research article), mode (written language), participants (an international scholarly readership), situational variety (formal), communicative purposes (sharing results from research and persuading audience) and level of expertise (expert writers) (see Pérez-Llantada, 2012, p. 74 for further details). SERAC represents 12 subdisciplinary fields of research: applied linguistics, information science, literature, sociology, business management, geography, urology, haematology, oncology, mechanical engineering, food technology and earth sciences. The representativeness of these subdisciplines is homogeneous across the three sets of texts so as to guarantee comparability. The study took a ‘radical corpus-driven approach’ (Biber, 2009, p. 281) and made no prior assumptions on the use of the language. I used automatic extraction to retrieve three frequency lists of 4-word bundles, one from each set of texts (Table S1, Supplementary material). To identify the core bundles in L1 and L2 English and L1 Spanish I compared the three lists manually and selected those pairs of bundles which had an equivalent translation (as in, e.g., of this type of/de este tipo de) or the words were cognates (as in, e.g., in the present study/en el presente trabajo; in the context of/en el ámbito de). The same procedure was used to identify the bundles shared by L2 English and L1 Spanish. It should be noted that the bundle by means of a, with no equivalent bundle in the L1 Spanish list, was paired with the bundle a través de un. The extended context of this Spanish bundle indicated that its predominant discourse function was to convey an instrumental/procedural meaning. This was the function of all by means of a occurrences in L2 English. Finally, lists of bundles specific to each set of texts were also retrieved manually (Table S2, Supplementary material). The four-word scope was selected because it is “the most researched length for writing studies” (Chen & Baker, 2010, p. 32). Further, it is argued that 4-word bundles often subsume 3-word bundles and are much more frequent (over 10 times more frequent) than five-word bundles. As Cortes (2004) notes, this length presents a wider variety of structures and functions to analyse than 3 and 5-word bundles. Identification of bundles was based on the actual word form (i.e., not on lemmas) and did not take into consideration the grammatical/syntactic status of the word forms. In the case of Spanish, inflected variants of the same lemma were treated separately. Biber et al. (1999, pp. 992–999) consider lexical bundles to be those word combinations that recur at a relatively low frequency cut-off, 10 times in a million words, and spread across at least five different texts in the sample so as to exclude individual writer idiosyncrasies (see also Biber & Barbieri, 2007; Biber, Conrad, & Cortes, 2004). For this study the conservative cut-off point of 20 times per million words and a dispersion criterion of at least 10% of the texts was selected to ensure the representativeness of the bundles identified. This cut-off point proved useful to filter out combinations of content bundles, for example, those that incorporated proper nouns (e.g. in the United States) or bundles containing nouns indicating topicspecificity within a given discipline (e.g. the Korean native speakers, garden of forking paths). This procedure was used by Chen and Baker (2010) and Ädel and Erman (2012), who sorted out bundles manually to exclude these word combinations. The present study also excluded bundles with the hash # symbol representing any number represented by digits (e.g. fig # shows the) and bundles with the hash # symbol specific to some but not to all the disciplines represented in SERAC (e.g. stored at ## c). This ensured that the extracted bundles were all lexico-grammatical combinations of function words plus content words, that is, “multi-word formulaic sequences” (Biber, 2009, p. 277). Although some studies merge overlapping bundles “to guard against inflated numbers” (Ädel & Erman, 2012, p. 84), in this study they were considered separately. Hyland (2008a), for instance, does not group overlapping bundles in his comparison of bundles across disciplines. Not grouping the overlapping bundles was a suitable procedure to retrieve bundles in an inflected language such as Spanish and identify plausible syntactic and lexical transfer of those bundles from L1 Spanish to L2 English. Fletcher’s (2002–2007) phraseological search engine kfN-gram was used to automatically retrieve the three frequency lists of 4-word bundles. Wordsmith Tools (Scott, 2008) helped compute tokens (running words) and types (the number of distinct words in each subset of texts). Type/token ratios (TTR) and standardised type/token ratios (STTR) (computed every 1000 running words) were computed to identify the lexical profile of the texts. Log-likelihood values were tested using Rayson’s online calculator (http://ucrel.lancs.ac.uk/llwizard.html) to identify statistical differentiation of the bundles used only by L1– L2 English and the bundles used only by L2 English–L1 Spanish. Statistical significance served to discern the extent to which the choice of bundles was constrained by ‘genre’ or by ‘language’ factors. Further, it proved apposite to make a number of assumptions on the L2 writers’ interlanguage development. Bundles bridge structural units in the discourse, framing semantic meanings. To investigate how formulaic sequences frame semantic meanings across language variables, bundles were classified structurally and functionally drawing on previous taxonomies (Biber, 2009; Biber, Conrad, & Cortes, 2004; Cortes, 2004, 2008; Hyland, 2008b). Following Biber (2009), the structural taxonomisation involved identification of types of structural units (phrasal/clausal) and types of words forming the bundle (content/function words). The discourse functions of the bundles were identified through qualitative, context-sensitive analysis and classified into three major categories, referential, text organisers and stance bundles (and their corresponding subcategories). Referential bundles were considered to be those helping writers “structure their experience and determine their way of looking at things” and text organisers were those “used to express textual functions which are concerned with the meaning of the sentence as a message in relation to the surrounding discourse” (Cortes, 2004, p. 401). Formulaic expressions of attitudes towards propositions were classified as stance bundles. Ädel and Erman (2012) validate this qualitative, contextsensitive approach for the identification of those bundles that are not used in the same way by L1 and L2 English speakers. Reservations may be expressed about the claim that frequency is not in itself an adequate guide for examining formulaic language and that communicative functions should rather be employed to examine formulaic expressions from the start.
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
87
Durrant and Mathews-Aydɪnlɪ (2011) support a function-first approach to the study of formulaicity consisting in tagging the corpus first for communicative functions and then retrieving the recurrent patterns for each formula. This procedure has been shown to be unproblematic for the study of the discourse and rhetorical functions of formulaicity in student essays (Wray, 2002) or introductions of student essays and research article introductions (Durrant & Mathews-Aydɪnlɪ, 2011). However, the procedure does not seem easy to apply to large-scale contrastive corpora representative of long text-exemplars (the average number of words per text in the L1 and L2 English and L1 Spanish sets of texts in SERAC is 5961, 5272 and 5030 words respectively). In the spirit of Biber et al. (1999) and Biber (2009), frequency-first was the decisive principle guiding the interpretation of corpus data in the present study. 3. Results and discussion 3.1. Overall findings The corpus search retrieved a total of 56 bundles in L1 English, 77 in L2 English and 114 in L1 Spanish (Table S1, Supplementary material), which supports the view that formulaicity and fixedness are key features of the academic written register both in English and Spanish. Hyland’s (2008b) study of clusters in a 3.5 million word corpus of RAs, doctoral dissertations and master’s theses includes many of the clusters appearing in the SERAC lists. Cortes (2004) reports 54 target bundles in history and 109 in biology writing, and several 4-grams in Römer’s (2010) phraseological analysis of academic book reviews overlap with the bundles found in SERAC. L2 English and L1 Spanish show a broader repertoire of bundles compared to L1 English. The L1 Spanish variable displays the largest stock of lexical bundles, twice as many as L1 English. Prima facie, this might be attributed to the fact that the corpus search in the L1 Spanish set of texts included word modification expressing the grammatical categories of number and gender (e.g. la mayoría de las/los, en la(s) que se, a la/los que se, a partir de la/las/los) (see also Cortes, 2008), discarding overlapping bundles. However, had these overlapping bundles been grouped together to make the comparison more reliable, the result of this process would have yielded 94 bundles, still a higher number of bundles than those retrieved from the two sets of English texts. Given that inflected bundles only amount to 1.7% of the total number of bundles, it is perhaps more sensible to interpret the broader repertoire of bundles in terms of lexical variety. L1 Spanish shows the highest number of types (70,190 distinct words), TTR (4.03) and STTR (39.21). L1 English scores lowest in these three indexes of lexical diversity. L2 English TTR and STTR indexes lie between those of L1 English and L1 Spanish (Table 1). The three lists illustrate the defining features of bundles described by the literature. Regarding frequency, Hyland (2008a, p. 6) states that the most frequent strings feature “over 200 times per million words”. In SERAC, L1 English does not include bundles occurring at this frequency, but a few four-word bundles do occur with very high frequencies in L2 English and L1 Spanish. The bundle in the case of in L2 English and its equivalent bundle in Spanish en el caso de occur over 200 times. Though not reaching this high frequency, this bundle ranks second in the L1 English list. Other bundles in L2 English (on the other hand, the end of the, with respect to the) and L1 Spanish (a lo largo de la, a la hora de, el hecho de que, a través de la, la mayoría de los, con el fin de, cada uno de los, en el que se) occur over 100 times. Decreasing the dispersion criterion to bundles occurring in at least 5% of the texts would probably yield a higher number of bundles featuring over 100 times. All the SERAC bundles are semantically non-idiomatic, that is, their meaning is “fully retrievable from the meaning of the individual words that make up the bundle” (Cortes, 2004, p. 400). In many cases, 2 or 3 of the words forming the bundle are function words constituting “parts of noun phrases and prepositional phrases” (Biber, Johansson, Leech, Conrad, & Finegan, 1999, p. 991) that extend across structural units. Complete structural forms (on the other hand, at the same time, in the present study in L1 English, on the other hand, at the same time, on the one hand, for the first time, in the present study and as can be seen in L2 English, and el punto de vista, en todos los casos, en los últimos años, un punto de vista, cada una de ellas and en el presente trabajo in L1 Spanish) represent circa 5% of the total number of bundles in each set of texts. As previously described (Cortes, 2004; Hyland, 2008a), many 4-word bundles in the three lists incorporate 3-word bundles and some 4-word bundles are incorporated into longer lexical bundles. This is the case of at the end of (the), in the case of (the) or it should be noted (that) in the two English sets of texts or en el caso de (la/los/las), (desde) el punto de vista (de) or (en) la mayoría de los (casos), among others, in L1 Spanish. The L1 and L2 English lists include 4-word bundles that incorporate many of the most frequently-used 3-word constructions found in the academic writing components of the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC) (Liu, 2012). In the three language variables, the majority of the bundles are phrasal and clausal fragments with new fragments embedded. Bundles bridge two structural units and the last word of the bundle is the first element of the second structure; as an example, a noun phrase begins a prepositional phrase (e.g. a function of the, the rest of the). The most common ending word of the bundles is a function word (articles a/the, prepositions such as of and de/del or complementisers and relativisers such as to, that and que). When the bundle is structurally complete, it is generally a prepositional phrase in L1 and L2 English (e.g. on the other hand, at the same time, in the present study, for the first time) and either a prepositional phrase or a noun phrase in L1 Spanish (e.g. el punto de vista, en todos los casos, en los últimos años, en el presente trabajo). Biber et al. (1999) state that circa 50% of the high-frequency bundles of academic prose are phrasal sequences. In SERAC, phrasal bundles outnumber clausal bundles (60%, 71% and 73% in L1 English, L2 English and L1 Spanish, respectively). Regardless of the language variable, all formulaic sequences reflect the syntax of the academic written register, grammatically compressed, “employing embedded phrases rather than fuller dependent clauses” (Biber & Gray, 2010, p. 7). Both
88
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
nouns with prepositional phrase post-modifiers (e.g. the presence of, the end of the, el caso de la) and prepositional phrases post-modified by other prepositional phrases (e.g. at the beginning of, in the context of, en el caso de, en el proceso de) reflect grammatical compression. Finite and non-finite complement clause fragments (e.g. it is possible that, should be noted that, el hecho de que, it is important to, can be used to, tener en cuenta que) characterise the discourse style of the three language variables as syntactically elaborated. In L1 Spanish, formulaic sequences such as phrasal bundles embedding relative clause fragments (e.g. en el que se, a los que se) also contribute to structural compression and syntactic elaboration (see also Section 3.3). 3.2. Convergent usage of bundles The comparison of the three lists yielded a common inventory of 12 core bundles. It can be assumed that, given their high frequency, these core bundles are extremely useful to both English and Spanish expert writers in order to express referential meanings and indicate text organisation. Core referential bundles serve to indicate time (e.g. at the same time/a la vez que, at the time of/en el momento de), to signal textual reference (e.g. in the present study/en el presente trabajo, are shown in table/se muestran en la) or to quantify entities (is one of the/es uno de los, the rest of the/el resto de los). The core bundles indicating text organisation either frame propositional meanings (with respect to the/con respecto a la, in the case of/en el caso de(l), in the context of/en el ámbito de), reflect relationships between preceding or upcoming text (the fact that the/el hecho de que) or convey identification-focus and inferential meanings (the results of the/los resultados de la, as a result of/como consecuencia de la). Cortes (2004) explains that the use of core bundles in history and biology writing reflects proficient language use. Supporting Ellis (2008), SERAC’s core inventory might indicate that the writers have memorised these language sequences and routinised them in their writing practices. With the empirical data gathered for the present study it is not clear what the process of memorisation is like and how those expressions turn into routines. Whether the writers have acquired these core bundles through explicit instruction or through usage, e.g., extensive reading and writing of similar texts (along the lines discussed by Ellis, 2008; Li & Schmitt, 2009; Pérez-Llantada, 2012; Schmitt, 2004), or possibly both, is an issue for further investigation. Another interesting finding is the amount of overlapping found when comparing the remaining (non-core) bundles. A total of 24 bundles overlap in L1 and L2 English, representing 43% of the L1 English bundles and 31% of the L2 English bundles. If we add to these 24 bundles the 12 core bundles, L1 and L2 English share a total of 36 bundles (47% of the L2 English bundles). Log likelihood values show no statistical differences regarding the use of 16 core/overlapping bundles (a function of the, a wide range of, as a function of, as a result of, at the same time, at the time of, in the context of, in the form of, in the present study, it is difficult to, it is important to, it should be noted, should be noted that, the results of the, the size of the, the total number of). The L2 English writers thus exhibit “competent idiomatic production” (Wray, 2002, p. 88) only in these bundles. This supports Pawley and Syder’s (1983) claim that few non-native learners ever fully accumulate the native repertoire of formulaic sequences. Aligning with Granger (1998) and Granger and Rayson (1998), it is likely that the remaining 20 bundles, overused at various levels of statistical significance (Table 2), are sequences that the writers are more familiarised with. Statistical significance tests also suggest that not all the core and overlapping bundles are learnt by the L2 writers at the same time. In agreement with Li and Schmitt (2009), these bundles appear to be learnt incrementally, covering several interlanguage stages. Consider the bundles can be used to and the nature of the, overused in L1 English (Table 2). The statistically significant underuse of these two bundles in L2 English suggests that several interlanguage stages need to be covered until the L2 writers attain native-like usage of these bundles (i.e. usage with no statistical significance). In contrast, the overuse of certain bundles at high levels of significance (e.g. are shown in table, at the end of the, in the case of) (Table 2) may indicate that the L2 writers are more familiarised with these bundles and are thus closer to achieving full proficiency of these bundles in the target language. L2 English formulaicity also consists of a small stock of 13 bundles shared with L1 Spanish (representing 17% of all L2 English bundles), which indicates that L1 Spanish cross-linguistic influence is very small. Only four out of the 13 bundles show no statistical differences (be taken into account/que tener en cuenta, by the fact that/por el hecho de, in the development of/en el desarrollo de, the development of the/el desarrollo de la), which means they are used in a native-like manner. The remaining Table 2 Statistical significance (L1 English relative to L2 English). p < 0.05
p < 0.01
p < 0.001
p < 0.0001
can be used to (O) in the presence of (U) on the basis of (U)
a large number of (U) as well as the (U) in terms of the (U) is one of the (U) it is possible to (U) the nature of the (O)
the use of the (U)
are shown in table (U) at the end of (U) in the case of (U) on the other hand (U) one of the most (U) the end of the (U) the fact that the (U) the rest of the (U) to the fact that (U) with respect to the (U)
O ¼ overuse; U ¼ underuse.
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
89
Table 3 Statistical significance (L2 English relative to L1 Spanish). p < 0.05
p < 0.01
p < 0.0001
the aim of this/el objetivo de este (U)
by means of a/a través de un (O) the study of the/el estudio de la (U)
by means of the/a través de la/los (U) in relation to the/en relación con el/la (U) of this type of/de este tipo de (U) the analysis of the/el análisis de la/los (U) the case of the/el caso de las/los (U) the existence of a/la existencia de un/una (U)
O ¼ overuse; U ¼ underuse.
bundles, with the exception of by means of a, are underused in L2 English at various levels of significance (Table 3). Perhaps the L2 writers must have overused these L1 Spanish bundles or used them with no statistical significance in earlier interlanguage development stages. At this juncture, what seems clear is that the L2 writers’ bundle usage reflects a ‘hybrid’ formulaic profile, not fully nativelike. These writers exhibit an advanced level of proficiency in some bundles (namely, the core and overlapping bundles shared with L1 English), although they also show an almost indiscernible L1 transfer (the four bundles mentioned above showing no statistical differences represent a meagre 5% of the total L2 English bundles). A relevant issue for further investigation would be to attest Kim and McDonough’s (2008) claim that usage of bundles in L2 according to L1 norms is mentally primed and/or primed through, e.g., explicit instruction and/or incidental learning. Both types of priming might play a role in SERAC L2 writers. The structures of the core bundles shared by the three sets of texts and the structures of the bundles shared only by L1–L2 English and by L2 English–L1 Spanish shed further light on the L2 writers’ interlanguage development. Consistent with previous corpus studies (Biber et al., 1999; Hyland, 2008a, 2008b), the core and overlapping bundles of L1–L2 English are mainly formed by NPs þ of-phrase fragments (the nature of the, the total number of, the use of the) and PrepP embedding ofphrase fragments (in the case of, at the end of, in the presence of), representing circa 60% of these common bundles (Table 4). Other Prep-phrases (on the other hand, at the same time, with respect to the) and the bundles using the structure (pronoun) þ (modal) V in passive þ clause fragment (can be used to, it should be noted, should be noted that) make up an additional 25%. The remaining bundle structures include Vbe þ NP (is one of the), NP þ post-nominal clause fragments (the fact that the) and it þ Vbe þ adj. þ (clause fragment) (it is difficult to, it is possible to, it is important to). Hewings and Hewings (2002, p. 368) state that the anticipatory it-pattern causes problems for non-native speakers because it has no counterpart in many languages. The fact that this structure has a similar counterpart in Spanish may explain why the L2 English writers use this structure with the same frequency the L1 English writers do. NPs þ of-phrase fragments account for almost half of the core and overlapping bundles in L2 English–L1 Spanish. Prepphrases with embedded of-phrase fragments (e.g. in the development of/en el desarrollo de, of this type of/de este tipo de) and other Prep-phrases (e.g. in relation to the/en relación con el/la, by the fact that/por el hecho de, in spite of the/a pesar de que) represent a further 46%. The recurrence of these structures in Peninsular Spanish is not an unexpected finding. As Cortes (2008) explains, Spanish nouns generally take postmodification and are rarely premodified. Convergent discourse functions of core bundles and overlapping bundles confirm that bundle usage by the L2 writers exhibits almost native-like competence, but also a small amount of L1 transfer. Fig. 1 plots the distribution of bundles across functional subcategories. The bundles shared by L1–L2 English perform referential and text-organising functions (amounting to 44% and 45% of all bundles respectively). Aligning with Biber (2009), referential bundles serve to provide identification of new information and are represented by the following subcategories: time, place, size, amount, temporal relations and quantification of entities (e.g. at the end of the, in the present study, the nature of the, a large number of, one of the most). As for the subcategories of textorganising bundles, circa 30% are framing bundles that connect preceding and forthcoming discourse (e.g. as well as the, in terms of the, in the case of, in the form of, in the presence of, the fact that the, to the fact that). Identification-focus expressions introducing discussion (it is difficult to, it is important to, the results of the), a comparison/contrast bundle (on the other hand) and inferential bundles that derive logical conclusions from premises known or assumed to be true (as a result of, on the basis Table 4 Structures of core bundles and overlapping bundles (%).
Noun phrase þ of-phrase fragment Noun phrase þ post-nominal clause fragment Prep phrase þ of-phrase fragment Other Prep phrases (pronoun) þ (modal) V in passive þ (PrepP/clause fragment) It þ Vbe þ adj. þ (clause fragment) V(be) þ complement (NP) Other expressions
L1–L2 English
L2 English–L1 Spanish
31 3 28 14 11 8 3 3
46 0.0 23 23 8 0.0 0.0 0.0
90
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
Fig. 1. Functions of core and overlapping bundles (%).
of) are other subcategories of text-organising bundles. As also shown in Fig. 1, the expression of stance (conveyed by the epistemic bundle it is possible to and other stance grams such as can be used to and should be noted that), aiming at guiding readers to particular interpretations (Biber, Conrad, & Cortes, 2004; Hyland, 2008a), is somewhat small. Stance, though, will turn out to be a prominent function of the distinctive bundles of L1 English (see Section 3.3). The bundles shared by L2 English and L1 Spanish mainly perform text-organising functions (61% of all common bundles in these two language variables). This category of bundles includes framing expressions (in relation to the/en relación con el/la and the case of the/el caso de la/los/el) and identification/focus bundles that introduce discussion (be taken into account/que tener en cuenta, the analysis of the/el análisis de la(s)/los, the study of the/el estudio de la). The subcategory of inferential bundles was represented by the bundle by the fact that/por el hecho de que. It is worth noting, though, that inferential bundles will play a prominent role in the list of bundles specific to L2 Spanish (see Section 3.3). The subcategory of comparison/contrast bundles was not represented in the list of common bundles shared by these two language variables. The category of referential bundles, amounting to almost 25% of all common bundles in these two language variables, includes time, place and descriptive expressions (the development of the/el desarrollo de la, the aim of this/el objetivo de este, the existence of a/la existencia de un/una). No common stance bundles were found in the comparison of the L2 English and the L1 Spanish lists of bundles. Finally, a total of 15% of all common bundles that did not fit in any of three main functional categories were included in the category of ‘Others’ (by means of a, by means of the/a través de la/los, in spite of the/a pesar de que, of this type of/de este tipo de). 3.3. Divergent usage of bundles Corpus findings also showed distinctive formulaic sequences in each language variable: a total of 20 distinctive bundles in L1 English (36% of the total bundles of this variable), 28 in L2 English (36% of the total bundles) and 83 (73%), 66 after refinement, in L1 Spanish, the language with the higher lexical diversity index (see Supplementary material, Table S2). Of note here is the fact that more than half of the bundles specific to L1 English are clausal fragments, whereas 70% of the bundles distinctive to L2 English and the majority of the L1 Spanish distinctive bundles are phrasal (Table 5), which again suggests language variability in terms of grammatical compression and syntactic elaboration. Table 5 shows that anticipatory it-clause fragments embedding probability/possibility adjectives (it is likely/possible that), and finite and non-finite complement clause fragments, many of them embedding probability and evaluative markers (that there is a, more likely to be, is/are likely to be, are more likely to, is important to note) stand out as the most distinctive L1 English Table 5 Structures of distinctive bundles (%).
Noun phrase þ of-phrase fragment Noun phrase þ post-nominal clause fragment/relative clause fragment Other noun phrases Prep phrase þ embedded of-phrase Prep phrases þ relative/complement clause fragment Other Prep phrases (modal) V þ (PrepP/clause fragment) (It) þ Vbe þ adj. þ (clause fragment) (relative pronoun) þ quasi-passive se Other expressions
L1 English
L2 English
L1 Spanish
10 5 5 25 0 0 5 35 0 15
31 0 4.9 17 0 24.4 14.6 2.4 0 4.9
19.7 6.6 1.6 41 6.6 6.6 1.6 0 6.6 9.8
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
91
bundle structures. Another, though much less frequent, structure is the passive bundle has been shown to, also reported in COCA and the BNC (Liu, 2012). As discussed later, these distinctive patterns perform interpersonally for the attainment of persuasion (Hyland, 1998, 2008b). Table 5 also shows that the most recurrent structure specific to L2 English is that of ‘Other Prep-phrases’ (e.g. by the fact that, for the first time, in accordance with the, in order to obtain, in relation to the, of the most important, of this paper is, on the one hand). Along with the clausal pattern (modal) V þ PrepP fragment (e.g. be due to the, be taken into account, can be found in, can be seen in, is based on the, is related to the), ‘Other Prep-phrase’ structures embed intersentential reason–result relationships with which writers build up discoursal argumentation. The most distinctive structure of L1 Spanish is the Prep-phrase followed by relative/complement clause fragments introducing a periphrastic passive (quasi-passive se) (e.g. a la/los que se, en el/la/las/lo/los que se, por lo que la, por lo que se). This structural pattern does not occur in the two sets of English texts (Table 5). The most likely reason, as Cortes (2008, p. 49) explains, is that in English the relativiser does not take an article, as it does in Spanish. The structure (complementiser) þ quasi-passive se (e.g. que se trata de, se trata de un/una), a transitive clause fragment with an inexplicitly stated agent, deploys a depersonalised style (see also Navarro, Hernández, & Rodríguez-Villanueva, 1994). On the other hand, the recurring postmodification across the distinctive structures of L1 Spanish indicates a high level of grammatical compression and might explain why this language variable shows the highest average sentence length, while L1 English scores lowest (Table 1). If we turn to the discourse functions of the distinctive bundles of each language variable, it seems clear that they convey completely different interpersonal pragmatic meanings. Almost half of the distinctive bundles of L1 English fall into the category of stance (Fig. 2). Of these, 35% are epistemic/impersonal expressions that express degrees of truth commitment (see also Hyland, 2008a). Hyland (1998, p. 439) explains that “gaining acceptance of academic claims involves both rational exposition and the manipulation of rhetorical and interactive features”, one of them being anticipatory it-patterns. L1 English bundles such as are likely to be, is/are more likely to, it is likely/possible that and more likely to be hedge the discourse, making authorial claims non face-threatening. The epistemic bundle is important to note also performs pragmatically as it places the thematic constituents at the end of the sentence (Ghadessy, 1995; Hewings & Hewings, 2002). The subcategory of ‘Other stance’ bundles includes the bundle is shown in fig, that serves to identify previous findings or known information and the passive bundle has been shown to, that helps authors distance themselves from the claims made in the texts. From Fig. 2 it is evident that overt expression of stance is not a prevalent function of the L2 distinctive bundles. Overall, the paucity of stance meanings builds a potentially face-threatening discourse. The fact that stance is kept at a minimum recalls the infrequent use of bundles such as it is possible/likely/unlikely by advanced Spanish ELF university students writing in English, an underuse that Neff (2008) attributes to L1 transfer. It also needs to be stressed here that the subcategory of ‘Other stance’ bundles does not only serve to hedge the discourse (e.g. the bundle it is necessary to), as this subcategory of bundles does in L1 English, but also to introduce evidential data and by this means attest the claims made in previous sentence(s) (e.g. This reality can be seen in a plethora of debates; This can be seen in the following contribution by FG). Pragmatic mismatch compared to L1 norms can also be seen in the use of the bundle can be found in, employed to refer to sources of evidential data of various sorts, from tangible examples to citations (e.g. The journal’s entire contents can be found in databases; Recent examples of masonry studies can be found in Anthoine (1995)). As may be deduced, the writers have acquired the bundles as fully lexicalised fixed expressions but they do not seem to use them appropriately pragmatically speaking. Inappropriate usage of bundles containing the English modal verb can has also been reported by Vázquez (2010, p. 84). This author explains that Peninsular Spanish has a scant repertoire of modal verbs and that the Spanish verb poder (the equivalent of can) is polysemous, encompassing both deontic and epistemic meanings. It can be asserted that SERAC’s L2 English writers have accommodated L1 modality in their L2 texts. Underuse and misuse of stance bundles in L2 English also aligns with previous claims regarding L2 learners’ difficulties in establishing a proper tenor in English (Granger, 1998; Howarth, 1998). Pragmatic mismatches have also been reported in Philippine scholars, who exhibit “a restricted use of participant-oriented bundles” and “limited awareness of the usage and importance of this particular function” (Salazar, 2010, p. 193), in L2 Chinese students, who do not show control of hedging ‘‘as
Fig. 2. Functions of distinctive bundles (%).
92
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
diversely and robustly as native writers do” (Chen & Baker, 2010, p. 43) and in Finnish undergraduates, who show less reliance on, and less variation in, stance bundles than their L1 English counterparts (Ädel & Erman, 2012). As can be seen in Fig. 2, L2 English mainly relies on inferential and identification-focus bundles (together representing almost half of all the L2 English bundles). The presence of inferential bundles (e.g. as a consequence of, due to the fact, be due to the, the effect of the, the basis of the, by the fact that) suggests that the L2 writers tend to use these bundles to construct a reason–result argumentative discourse. Interestingly, it is the identification-focus bundles distinctive to L2 English, and not the stance bundles, as was the case in L1 English, that embed evaluative adjectives (e.g. an important role in, of the most important) or guide readers’ interpretation persuasively (be taken into account, taking into account the). As a result, the discourse becomes stylistically verbose and unqualified. Chen and Baker (2010) report similar findings with L2 English Chinese students. Whether or not L2 authors could be expressing stance using linguistic items that do not constitute formulaic language remains an issue for further investigation. In the case of SERAC, preliminary observations of bundles and their discoursal co-text suggest that the inferential bundles due to the fact, be due to the and by the fact that tend to be accompanied by a repertoire of individual stance markers (e.g. epistemic adverbs, epistemic verbs and, above all, modal verbs) in their surrounding text. The following are some examples: 1. This is probably due to the fact that the lipids were not affected by oxidation during culinary treatment 2. We believe that an important role is played here by the fact that the majority of girls obtain . 3. The differences between them could be due to the fact that the criteria enumerated in Section 4.2 give sufficient but not necessary conditions for stability. 4. This may be due to the fact that the magazines analysed are addressed to experienced computer users. 5. This preference might be due to the fact that the Spanish direct translation of “listen” (“oye”) is not . 6. This very high precision index can be explained by the fact that the application was geared to. 7. This level of participation is higher than the observed actual participation but may be explained by the fact that our sample is of younger and educated people 8. This hypothesis would be supported by the fact that positive TP protein expression correlates with tumour response Investigating the pragmatic nuances of these individual stance markers in the accompanying co-text of inferential bundles can be a relevant issue for future contrastive research on L1–L2 English academic writing. Why do SERAC L2 expert writers manifest less control of formulaic language expressing stance? Supporting previous studies (Ellis, 2008; Granger, 1998; Gilquin et al., 2007; Meunier & Granger, 2008; Neff, 2008), the most likely reason is L1 syntactic and lexical transfer. In SERAC, almost half of the bundles distinctive to L1 Spanish perform text-organising identification/focus and framing functions. These were also the prevailing discourse functions of the bundles shared by L2 English–L1 Spanish (see Section 3.2). Whether L1 Spanish is a likely determinant for the scarce use of stance bundles in L2 English or whether, as suggested previously (Chen & Baker, 2010; Granger, 1998; Howarth, 1998), the L2 English writers use fewer hedges because they have not yet acquired full pragmatic competence, or maybe both, needs further empirical investigation. Research is germane since deviant use of L1 English pragmatic norms, that Kourilová (1998, p. 112) attributes to unawareness of “subtle degrees of truth commitment and of potentially face threatening acts” hinders L2 writers’ success in journal publication (see also Flowerdew, 2001; Pérez-Llantada, 2012).
4. Conclusion The corpus-driven approach taken in the present study has shown that formulaicity is a key feature of the academic written register across language variables and that genre determines writers’ choice of formulaic sequences in terms of frequency, structural constituency, semantic non-idiomaticity, syntax and overall discourse style. If formulaicity is understood as an indication of the “speaker’s group identity” in social interaction (Wray & Perkins, 2000, p. 18), the lists of bundles retrieved from SERAC confirm that writers, as ‘insiders’ of the community, rely on “recurrent responses to recurrent communicative situations” (Durrant & Mathews-Aydɪnlɪ, 2011, p. 58). Hyland (2008a, p. 20) claims that analyses of bundles offer insights into a crucial dimension of language use which cannot be overlooked when examining languages for academic communication. The present study has offered an empirical characterisation of the structures and discourse functions of bundles in both L1 and L2 English and in L1 Spanish. Across language variables, convergent usage of formulaic sequences has shown that phrasal modification prevails, which creates a grammatically compressed discourse style. Further, the stock of core bundles in the three variables and the bundles shared by L1– L2 English and L2 English–L1 Spanish have been shown to convey, though with slight syntactic variations, both semantic and interpersonal meanings in discourse. On the other hand, divergent bundle usage across language variables has brought to the fore the distinctive pragmatics of L1 English and L1 Spanish academic writing styles. While formulaic language in L1 English has proved to perform pragmatically, at times mitigating claims, at other times making statements less open to negotiation through overt evaluation, the formulaicity of L1 academic Spanish has been characterised by a detached style devoid of any pragmatic qualification. The present findings might be taken as unprecedented since they have shown, based on corpus evidence, that the L2 English variable reflects a ‘hybrid’ formulaic language. It exhibits a small stock of register-determined bundles, also shared by
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
93
L1 English and L1 Spanish (representing 16% of all L2 English bundles) as well as a considerable percentage of formulaic sequences used by the L1 English writers (31% of all its lexical bundles). However, it retains a small stock of bundles transferred from L1 Spanish (amounting to 17% of all the L2 English bundles) and exhibits a not unremarkable percentage of idiosyncratic bundles (the remaining 36% of its bundles). Succinctly, L2 is partly, but not yet fully native-like. From the findings, it also seems clear the acquisition of formulaicity in L2 English takes place incrementally and that native-like mastery of it is, as Warren (2005) argues, difficult to attain. As discussed above, knowledge of formulaicity in the writers’ native L1 might have fostered the acquisition of formulaicity in the L2, offering evidence that “transfer affects L2 phraseology” (Ellis, 2008, p. 8). The acquisition of bundles in the L2 that have no equivalent structural constituency or similar discoursal functionality of bundles in the writers’ native L1, appears to take longer because of their ‘newness’ or because there is no priming of any sort. Also, the present findings have shown that, although the L2 writers have “a sense of ‘salience’” (Cowie, 1998, p. 13) regarding usage of certain bundles, this salience may involve pragmatic misuse. Bundles conveying stance meanings have been shown to be a case in point, confirming that L2 pragmatics neither fully aligns with the pragmatics of the target language nor with the pragmatics of the L2 writers’ native L1. In an attempt to place the focus on interlinguistic variation in lexical bundle usage, this study has overlooked disciplinarity. Research with monolingual corpora has empirically attested the existence disciplinary-sensitive repertoires of bundles in the context of research article writing (e.g., Cortes, 2004; Hyland, 2008a). Future investigations should engage in interlinguistic comparison of bundles across the disciplinary spectrum to determine what bundles are specific to particular disciplines and what discourse functions these discipline-specific bundles perform in L1 and L2 writing. It would also be of theoretical interest to further investigate the hybrid formulaic nature of L2 English research articles and assess the impact of the ‘discipline’ variable across different knowledge fields. The present findings enable us to recommend genre-based pedagogical instruction, one that exposes learners to lists and contextualised examples of high-frequency bundles in research article writing. A genre-based approach can raise awareness of the phraseological specificity of this sub-register and elicit corpus-informed discussion about relevant features of bundles d namely, structural constituency, semantic non-idiomaticity, syntax and overall discourse style. As suggested by Cortes (2013), it also seems sensible to expose learners to corpus-based evidence and authentic textual samples of the specific phraseological sequences associated with the different rhetorical moves of the genre. Contrastive exposure to bundles, that is, one focusing on core bundles and on convergent and divergent bundle usage across language variables, together with a detailed explanation of the functions that those bundles help express in research article writing would also be pedagogically beneficial. Such exposure can elicit, in both novice writers and professional writers who have not yet attained native-like formulaic competence in the genre, critical views of the ways the pragmatic meanings of bundles shape distinct discoursal developments and academic styles. It would also be germane to sensitise these writers to the fact that pragmatic failure in the use of formulaic sequences in research article writing may hamper communication. Although formulaic expressions can be explicitly learnt by implementing, e.g., the approach described above, we should be perceptive of previous empirically-based claims stating that formulaic knowledge is acquired from usage (Ellis, 2008, p. 7). Pedagogical instruction should be facilitative in this respect and provide learners with opportunities to practice bundle usage so as to trigger acquisition of formulaic language. The study of formulaicity in L1 and L2 writing using large-scale comparable corpora for automatic extraction of data and measurements based on highly restricted frequency cut-off points allows researchers to predict and explain the L2 professional writers’ difficulties in research article writing. As shown in the present study, contrasting the structural properties and the discoursal functionality of phraseological units in L1 and L2 academic English serves to identify the L2 English writers’ degree of native-like bundle usage and describe the different interlanguage development stages towards full native-like formulaicity. Longitudinal studies are desirable to track the way the repertoire of lexical bundles increases with advancing writing proficiency and document the reasons why some bundles are primed over others in the acquisition and learning processes. Last but not least, variations of interpersonal pragmatics across language variables deserve further examination to better understand the influence of the writers’ L1 pragmatics in their L2 English production and ascertain reasons why published writers do not seem to attain full pragmatic proficiency in L2 English. It would also be of interest to investigate semantic prosody and the semantic preference of lexical bundles contrastively, along the lines proposed by Cortes and Hardy (2013), to empirically characterise the particular ways in which bundles build the semantics and pragmatics of the research article genre across language variables. Acknowledgements I am grateful to the English Language Institute at the University of Michigan (USA) for sponsoring the research stay during which the present study was conducted. I thank the financial support for the research stay from the Spanish Ministry of Economy and Competitiveness (under the project “English as a lingua franca across specialised discourses: A critical genre analysis of alternative spaces of linguistic and cultural production” (Plan Nacional IþDþi, FFI2012-37346). I also thank the anonymous reviewers and JEAP editor Paul Thompson for their suggestions. Appendix A. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.jeap.2014.01.002.
94
C. Pérez-Llantada / Journal of English for Academic Purposes 14 (2014) 84–94
References Ädel, A., & Erman, A. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: a lexical bundles approach. English for Specific Purposes, 31, 81–92. Biber, D. (2009). A corpus-driven approach to formulaic language in English. Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3), 275–311. Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26(3), 263–286. Biber, D., Conrad, S., & Cortes, V. (2004). If you look at: lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–403. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics. Investigating language structure and use. Cambridge: Cambridge University Press. Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: complexity, elaboration, explicitness. Journal of English for Academic Purposes, 9, 2–20. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar of spoken and written English. London: Longman. Chen, Y.-H., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and Technology, 14(2), 30–49. Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: examples from history and biology. English for Specific Purposes, 23, 397–423. Cortes, V. (2008). A comparative analysis of lexical bundles in academic history writing in English and Spanish. Corpora, 3, 43–58. Cortes, V. (2013). The purpose of this study is to: connecting lexical bundles to moves in research article introductions. Journal of English for Academic Purposes, 12, 33–43. Cortes, V., & Hardy, D. (2013). Analyzing the semantic prosody and semantic preference of lexical bundles. In D. Belcher, & G. Nelson (Eds.), Critical and corpus-based approaches to IR (pp. 180–201). Ann Arbor, MI: University of Michigan Press. Cowie, A. P. (Ed.). (1998). Phraseology: Theory, analysis and applications. Oxford: Clarendon Press. Durrant, P., & Mathews-Aydınlı, J. (2011). A function-first approach to identifying formulaic language in academic writing. English for Specific Purposes, 30(1), 58–72. Ellis, N. C. (2008). Phraseology. The periphery and the heart of language. In F. Meunier, & S. Granger (Eds.), Phraseology in foreign language learning and teaching (pp. 1–13). Amsterdam: John Benjamins. Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: psycholinguistics, corpus linguistics and TESOL. TESOL Quarterly, 42(3), 375–396. Flowerdew, J. (2001). Attitudes of journal editors to non-native-speaker contributions: an interview study. TESOL Quarterly, 35, 121–150. Ghadessy, M. (Ed.). (1995). Thematic development in English texts. London: Pinter. Gilquin, G., Granger, S., & Paquot, M. (2007). Learner corpora: the missing link in EAP pedagogy. Journal of English for Academic Purposes, 6(4), 319–335. Granger, S. (1998). Prefabricated patterns in advanced EFL writing. Collocations and formulae. In A. P. Cowie (Ed.), Phraseology. Theory analysis and applications (pp. 145–160). Oxford: Oxford University Press. Granger, S., & Meunier, F. (Eds.). (2008). Phraseology. An interdisciplinary perspective. Amsterdam: John Benjamins. Granger, S., & Rayson, P. (1998). Automatic lexical profiling of learner texts. In S. Granger (Ed.), Learner English on computer (pp. 119–131). London: Addison Wesley Longman. Hewings, M., & Hewings, A. (2002). It is interesting to note that: a comparative study of anticipatory it in student and published writing. English for Specific Purposes, 21, 367–383. Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24–44. Hyland, K. (1998). Persuasion and context: the pragmatics of academic metadiscourse. Journal of Pragmatics, 30, 437–455. Hyland, K. (2008a). As can be seen: lexical bundles and disciplinary variation. English for Specific Purposes, 27, 4–21. Hyland, K. (2008b). Academic clusters: text patterning in published and postgraduate writing. International Journal of Applied Linguistics, 18(1), 41–62. Kachru, Y. (2009). Academic writing in World Englishes. The Asian context. In K. Murata, & J. Jenkins (Eds.), Global Englishes in Asian contexts (pp. 111–130). Basingstoke: Palgrave-Macmillan. Kim, Y., & McDonough, K. (2008). Learners’ production of passives during syntactic priming activities. Applied Linguistics, 29(1), 149–154. Kourilová, M. (1998). Communicative characteristics of reviews of scientific papers written by non-native users of English. Endocrine Regulations, 32, 107–114. Li, J., & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: a longitudinal case study. Journal of Second Language Writing, 18(2), 85–102. Liu, D. (2012). The most frequently-used multi-word constructions in academic written English: a multi-corpus study. English for Specific Purposes, 31, 25–35. Meunier, F., & Granger, S. (Eds.). (2008). Phraseology in foreign language learning and teaching. Amsterdam: John Benjamins. Navarro, F. A., Hernández, F., & Rodríguez-Villanueva, L. (1994). Uso y abuso de la voz pasiva en el lenguaje médico escrito. Medicina Clinica, 103, 461–464. Neff van Aertselaer, J. (2008). Contrasting English -Spanish interpersonal discourse phrases. A corpus study. In F. Meunier, & S. Granger (Eds.), Phraseology in foreign language learning and teaching (pp. 85–100). Amsterdam: John Benjamins. Pérez-Llantada, C. (2012). Scientific discourse and the rhetoric of globalization. The impact of culture and language. London: Continuum. Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: native-like selection and native-like fluency. In J. Richards, & R. Schmidt (Eds.), Language and communication (pp. 191–226). London: Longman. Römer, U. (2009). The inseparability of lexis and grammar. Annual Review of Cognitive Linguistics, 7, 141–163. Römer, U. (2010). Establishing the phraseological profile of a text type: the construction of meaning in academic book reviews. English Text Construction, 3(1), 95–119. Salazar, D. (2010). Lexical bundles in Philippine and British scientific English. Philippine Journal of Linguistics, 41, 94–109. Schmitt, N. (2004). Formulaic sequences. Acquisition, processing and use. Amsterdam: John Benjamins. Scott, M. (2008). WordSmith Tools version 5. Liverpool: Lexical Analysis Software. Staples, S., Egbert, J., Biber, D., & McClair, A. (2013). Formulaic sequences and EAP writing development: lexical bundles in the TOEFL iBT writing section. Journal of English for Academic Purposes, 12, 214–225. Vázquez, I. (2010). A contrastive analysis of the use of modal verbs in the expression of epistemic stance in business management research articles in English and Spanish. Ibérica, 19, 77–96. Warren, B. (2005). A model of idiomaticity. Nordic Journal of English Studies, 4(1), 35–54. Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press. Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press. Wray, A., & Perkins, M. R. (2000). The functions of formulaic language: an integrated model. Language and Communication, 20, 1–28. Carmen Pérez-Llantada is Professor of English Linguistics at the University of Zaragoza (Spain). She is a member of InterLAE, a national-based research group funded by the Spanish Ministry of Economy and Competitiveness. Her research interests are genre- and register-related features of academic discourse and English as a Lingua Franca. She is also involved in the “English in Europe: Opportunity or Threat?” Leverhulme International Network, and in “The Worldwide Challenge of English”, a Worldwide Universities Network project led by Prof. A. Linn, University of Sheffield (UK).