Journal of English for Academic Purposes 33 (2018) 12e23
Contents lists available at ScienceDirect
Journal of English for Academic Purposes journal homepage: www.elsevier.com/locate/jeap
As Hill seems to suggest: Variability in formulaic sequences with interpersonal functions in L1 novice and expert academic writing Ying Wang Uppsala University, Sweden
a r t i c l e i n f o
a b s t r a c t
Article history: Received 27 June 2017 Received in revised form 26 December 2017 Accepted 9 January 2018
Formulaic sequences (FSs) are pervasive in natural language use and play an important role in differentiating socially-situated practices. The predominant trend in this research area is to take a frequency-based approach, relying on the computer to identify frequent recurrent forms in a given corpus, at the expense of disregarding their structural and semantic unity and multifunctionality, as well as overlooking discontinuous and infrequent sequences. Through careful manual identification and annotation of FSs in context, the present study provides additional insights into the use of interpersonal FSs that distinguish L1 novice and expert academic writing. The results show that the novice writers actually produced a wider range of FSs with interpersonal functions than did the expert writers. It is argued that less frequent FSs cannot be dismissed in FL research simply because of their low frequencies. Taken together, these seemingly idiosyncratic choices may reveal important functional and formulaic features that characterise a particular community. The main differences between the two groups of writers pinpoint areas (e.g., genre- and discplinespecific conventions, register awareness) that need further investigation and specific attention in the training of novice writers. © 2018 Elsevier Ltd. All rights reserved.
Keywords: Formulaic sequences Manual identification L1 novice writing Expert writing Disciplinary discourses Genre
1. Introduction Research into formulaic sequences (FSs), i.e., words that have ‘an especially strong relationship with each other in creating their meaning’ (Wray, 2008: 9), such as by and large, of course, on the other hand, has been one of the rapidly growing areas in applied linguistics over the past decade (Hyland, 2008, pp. 41e62). Corpus studies, in particular, have revealed that they are pervasive in natural language use (e.g., Biber, Johansson, Leech, Conrad, & Finegan, 1999; Erman & Warren, 2000; Martinez & Schmitt, 2012), and play an important role in differentiating socially-situated practices (Biber, Conrad, & Cortes, 2004; Hyland, 2012). Research in English for academic purposes (EAP) demonstrates that professional academics and student writers alike draw on formulaic resources to ‘develop their argument, establish their credibility and persuade their readers’ (Hyland, 2008, p. 59). The predominant trend in this research area is to take a frequency-based approach (e.g., lexical bundle, n-gram), relying on the computer to automatically identify frequently recurring word sequences in a given corpus, at the expense of disregarding their structural and semantic unity and multifunctionality, as well as overlooking discontinuous and infrequent multi-word
E-mail address:
[email protected]. https://doi.org/10.1016/j.jeap.2018.01.003 1475-1585/© 2018 Elsevier Ltd. All rights reserved.
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
13
units. In the present study, FSs and their functions are identified and annotated manually in context and the subsequent analysis starts from the functions and moves towards outlining and understanding the range of forms that are associated with those functions, with a view to providing insights into formulaicity in academic discourse that may have been missed in the prevalent corpus-based studies. As part of an on-going project that examines formulaic language (FL) using the system of metafunction (ideational, textual, interpersonal) in Systemic Functional Linguistics (SFL), the present study focuses on the interpersonal metafunction (i.e., the use of language to evaluate and take a stance on the proposition projected, and/or to build up a relation between the text-producer and the text-receiver). Much attention in previous research on formulaicity in academic writing has been given to novice writers, in particular € non-native novice writers (e.g., Chen & Baker, 2010; Ebeling & Hasselgård, 2015; Hyland, 2008; Salazar, 2014; Adel & Erman, 2012). One of the main findings in this regard is that non-native novice writers tend to restrict themselves to a small range of FSs that are overused (see also Wray, 1999; Wang, 2016, pp. 4e6 for an overview). Native (or L1) novice writers, if involved in these studies, appear to be sidelined, serving mainly as the comparative basis against which non-native data are evaluated. Part of the reason for this lack of focus may be that native speakers' use and processing of FL in general is considered to be fairly well understood. To put it in a nutshell, FSs are said to constitute the bulk of native speakers’ mental lexicon e that is, they are stored and retrieved whole from memory rather than generated anew on each occasion when they are needed, and therefore function as processing short-cuts (Wray & Perkins, 2000). While this may well be true of everyday communication rez-Llantada (2014) argues that formulaicity in academic writing may not be an inherent (i.e., and general language use, Pe language universal) skill; rather it is likely to be associated with expert (native or not) academic writing production through formal instruction as well as extensive academic reading and writing practices. Therefore, more needs to be known about the development of formulaicity in academic discourse from the perspective of native novice writers. This paper is an attempt to fill this gap. More specifically, it attempts to answer the following research questions: i) What interpersonal functions are more frequently used with FSs in L1 novice or expert writing? ii) What are the main similarities and differences between L1 novice and expert writing in terms of the range of forms associated with the functions? The remainder of this paper is organised as follows. Section 2 introduces the concept of FS and justifies the need for more manual analyses in the field. Section 3 presents the data used for the study and an annotation scheme for the analysis of interpersonal functions based on SFL as well as the criteria used to identify FSs. Section 4 reports on and discusses the results, and finally the paper ends with a summary of the main findings and their implications in Section 5.
2. Formulaic sequences The term ‘formulaic sequence’ is formally defined by Wray (2002, p. 9) as ‘a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar’. Although not explicitly specified in the definition, there are a few points that are essential for the understanding of formulaicity. First of all, FL is a complex construct and there are many types of FSs (Biber, 2009). Generally speaking, what makes a word sequence ‘appear to be prefabricated’ can be either its high frequency of occurrence in a given situation, or the internal fixedness of the form, or sometimes both. This is why FS is used in the literature as an umbrella term to mean anything from idioms (e.g., in a nutshell), phrases (e.g., by and large), collocations (e.g., deeply committed, highly recommended), to clusters or multi-word units/expressions (e.g., at the end of, as can be seen), which may vary enormously in their idiomaticity, invariability, and structural completeness. Secondly, FSs are ‘retrieved whole from memory at the time of use’ to meet different needs, which in turn are closely related to communities of practice. In other words, FSs develop to serve important communicative needs of a given discourse community. The appropriate choice of an expression among a variety of alternatives marks the speaker/writer as a member of that community. The distribution of FSs can therefore help characterise different discourses. Many studies have attested to considerable variation across genres and registers according to the extent to which formulaicity is applied (e.g., Biber, 2006; Biber et al., 1999; Wang, 2017a, 2017b). Most FL studies take a frequency-based approach; that is, FSs are identified entirely on the basis of the recurrence of uninterrupted linguistic forms, whether or not they make up a complete structural unit or have a cohesive meaning or function (e.g., as a result of, due to the fact that, is always, that can be, in a slightly, although it is). While the results of such an approach are valuable in revealing, for instance, variation across discourses and interactions, they are of limited use to language learners and novice writers, for whom the key information about FSs is rarely which sequences are the most frequent per se. According to Durrant and Mathews-Aydınlı (2011), what novice writers need to know are rather what functions they are likely to employ in a given situation and what forms most appropriately fulfil these functions, as well as what restrictions that are placed on the forms in specific contexts. This is an aspect to which the present study aims to contribute. From a methodological point of view, while the frequency-based approach has the advantage of being straightforward and consistent, and can be scaled up to very large datasets, its inherent limitations have also been increasingly recognised. First of
14
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
all, simple frequency is not always a good guide to formulaicity (Biber, 2009; Schneider et al., 2014; Wray, 2009). Among others, Schneider et al. (2014) find that 82% of the multi-word units that they identify occur only once in their corpus. This means a great proportion of FSs could have been overlooked should frequency be the only criterion for the identification. Secondly, different operational criteria regarding the rate of occurrence, distribution across different texts, length of word € sequences, and the way of dealing with overlapping sequences can lead to different conclusions (see Adel & Erman, 2012 for a more detailed discussion). A further issue concerns the multi-functionality of many FSs and how this is dealt with in frequency-based studies. The common practice is to align a given type with its most probable or common function, regardless of the number of actual instances and the context in which they occur. Such practice, as cautioned by Swales (1990, p. 17), can € be ‘a dangerous simplification, especially in professional settings’ (see also Adel & Erman, 2012). Through careful manual identification and annotation of FSs in context, the present study seeks to overcome these limitations and provide additional insights into FSs and their functions in written academic discourse. To conclude, FL is such a complex phenomenon that there is still a need to embrace new and complementary methodological approaches (Biber, 2009). While the potential contribution that manual analyses can make to the understanding of FL € € has been increasingly realised (e.g., Durrant & Mathews-Aydınlı, 2011; Adel & Erman, 2012; Adel, 2014), little has actually done on that front until now. The present study is thus a step forward in exploring the potential and feasibility of this approach. 3. Data and procedure 3.1. Data The present study involves 26 texts of about 100,000 words (see Table 1), which form two corpora, representing L1 novice and expert writing, respectively. Direct quotations and reference lists were excluded from the word count and the subsequent analysis. The novice texts were drawn from the BAWE corpus (Nesi & Gardner, 2012), covering six disciplines: Philosophy, History, Maths, Physics, Linguistics, and Engineering. The texts are of the same type, namely ‘Essay’, written by L1-English university students in years 3 and 4, and awarded the same grade (‘distinction’). The expert corpus is made up of published articles on the same topics as those covered in the novice corpus. The Essay genre in BAWE is said to be argumentative in nature, where the students ‘are expected to develop ideas, make connections between arguments and evidence, and develop an individualised thesis’ (Nesi & Gardner, 2012, p. 38). A quick inspection of the student essays included in the present study revealed that the genre can differ across disciplines and from that of the corresponding expert essays. While in Philosophy, History, Maths, the genre seems to be basically the same for the novice and expert texts (argumentative for Philosophy and History and problem-solving for Maths), it is very different in Physics and Engineering where the expert writers presented either original research or an extensive review of the recent development of a certain area, while the students typically produced argumentative essays. The Linguistics essays present a mixed picture: while both expert essays are clearly research papers, one student essay provides an overview of the field in question and the other two resemble the genre of research articles. The data were manually examined to identify FSs associated with interpersonal functions. The UAM Corpus Tool (O'Donnell, 2013), which allows manual and automatic annotation of corpus at multiple annotation layers, was used for the annotation. 3.2. Classification of interpersonal functions The classification of interpersonal functions for the present study is based on SFL, which focuses on the underlying communicative functions of language and the systemic choices that are made available by the language system. Key to the use of SFL is the notion of metafunction, which refers to three separate strands of meaning (ideational, textual, interpersonal) that are deployed simultaneously in a text. In previous studies of lexical bundles, most notably Cortes (2004), Biber et al. (2004), € and Hyland (2008, pp. 41e62), the functional categorisation is almost all based on the SFL framework. However, as Adel and Erman (2012) point out, the taxonomy used so far is still not fully established, dealing mainly with three broad functional categories (corresponding to the three metafunctions) and a few subcategories within each. Meanwhile, while some subcategories seem to be well defined, others are not agreed upon in the literature, making it difficult to compare the results from different studies. Another framework that has been used in previous research such as Durrant and Mathews-Aydınlı (2011) comes from Swales (1990; 2004) notion of ‘generic moves’ and steps. This framework is useful for analysing patterns of
Table 1 Data used in the study. Corpus
No. of texts
No. of words
Novice Corpus Expert Corpus Total
15 11 26
46,722 52,626 99,348
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
15
rhetorical, informational, and conceptual organisation that distinguish a particular genre. It is, however, not adopted in the present study for the reason that a ‘move’, being a functional unit, is often realised by a much larger linguistic construct (a clause or several sentences) than a word sequence (Swales, 2004, pp. 228e229). Given the above considerations, the SFL framework was used to derive a functional taxonomy that is suitable for a comprehensive analysis of functions. By adhering to the original SFL framework (rather than the adaptations in some previous studies), it is also hoped that the taxonomy can be applied to additional types of discourse with as little modification as possible in future research to enable comparison across corpora and domains of use. The present study focuses on one of the metafunctions, namely the interpersonal metafunction, which involves expressions that reveal speaker attitude towards or assessment of the status of a message (e.g., probability, desirability, significance). It is associated mainly with the systems of mood and modality (Halliday, 2014, p. 190), of which the semantic system is very complex, realised by a range of forms which can involve verbs, adjective, and adverbs. In this study, four broad categories of interpersonal functions (Modality, Evaluation, Commitment, Engagement) were distinguished. They are introduced below with illustrative examples. Modality refers to lexical resources that enable the writer to fine tune their propositions in terms of the following properties: Probability/possibility: it is possible that, is likely to (Un)certainty: of course, it is clear that, seem to Intensity: above all, at all, yet again Degree: to a certain degree, the smallest possible, vastly different Validation (how valid the proposition is): broadly speaking, in general, on the whole, in principle Necessity/obligation: it is necessary to, need to Desire: wish to, aim to Verification: as a matter of fact, in fact Capability: be able to, be capable of Note that a number of functions in Halliday's framework are not present here because of their rare/non-occurrence in the data examined. It could be that those functions are not normally employed in academic writing, or they are realised by singleword expressions, e.g., morality: rightly, wrongly, justifiably (see Halliday, 2014, p. 191). Evaluation, also called comment adjuncts (Halliday, 2014, p.190), refers to lexical resources used to assign a quality that someone or something possesses from the writer's point of view. They differ from classifying qualities in that they are generally gradable (i.e., they may be modified by very or rather, or can be used in superlative or comparative structures). This category was further divided into the following subcategories that occur most frequently in the data: Desirability (signalling a subjective cline according to the writer's evaluation of how desirable the proposition/phenomenon under discussion is): it is helpful/useful to, superior performance, highly reliable, the problem with, severely limited, suffer from, it is difficult to Significance: of essential importance, it is important to, play a major role in Presumption: it is reasonable to, lend support to Prediction (how expected): to one's surprise, as expected Scope: suffice it to say, it is sufficient to Commitment, also called interpersonal projection (Halliday, 2014, pp. 698e700), is preceded by a personal or impersonal ‘projector’, and constructs positional value/assessment according to how explicit the writer wants to be about where her/his assessments come from, or how subjective or objective s/he wants them to appear, mainly involving reporting or mental verbs, e.g., admit that, is said to, it is believed that, is seen as.1 Engagement involves mostly imperatives used to instruct or build relationship with the reader (Hyland, 2004, p. 139), e.g., see X for (proof), notice that. 3.3. Criteria for identification of FSs As mentioned in Section 2, there are many types of FS. The present study aims to be as inclusive as possible, and therefore mixed criteria were adopted to identify FSs in the corpora, based on the rationale that ‘most examples will be captured one way or another’ (Wray, 2008, p. 110). A word sequence has to meet at least one of the following criteria to be regarded as a FS, but many may satisfy more than one. Grammatical irregularity and/or semantic opacity (Schneider et al., 2014; Wray, 2008): this criterion means that as long as some aspect of the form or meaning of a word sequence is not strictly predictable from its component parts or from regular grammar, the expression is a FS, e.g., of course, suffice it to say, on average, above and beyond, play a vital role in, shed light on. Note that there is a continuum of fixedness, ranging from those resulted from a grammaticalisation or lexicalisation process and thus entirely invariable (e.g., above and beyond, of course) to those that allow a certain degree of compositional freedom and semantic transparency; for instance, some variable sequences allow a specified set of alternative content words or
1 Note that these verbs on the surface may seem to be similar to reporting and mental/cognitive processes, but they perform a function similar to modal verbs concerned with a speaker's assessment of the proposition in these cases (Stillar, 1998, p. 37).
16
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
modifiers in one position (e.g., play a vital/important/major role), and others permit a range of morphological possibilities (e.g., it has been/it is/it was suggested that) (Wray, 2002, p. 50). Since the distinction between the idiomatic and non-idiomatic is by no means clear-cut (Martinez & Schmitt, 2012, pp. 308e309), this criterion has been shown to be difficult to apply, which normally involves the researcher's subjective evaluation (Buerki, 2016). In order to ensure that this criterion is applied consistently in the present study and replicable results can be produced by other researchers, dictionaries (primarily Oxford Advanced Learner's Dictionary) and the list of phrasal expressions provided by Martinez and Schmitt (2012) were constantly consulted. If a word sequence is highlighted in the dictionaries (either as a separate entry or emphasised in boldface) or occurs on the list, it was considered to contain some kind of irregularity and therefore a FS. Underlying frame (Wray, 2008): a formulaic frame involves some open slots to be filled, often by items of similar characteristics, e.g., as X as possible, too X to. Situation/register/genre-specific formula (Buerki, 2016; Wray, 2008): what is idiomatic about this type of FSs is not their internal semantics or syntax, but the fact that they are the normal ways (judged by frequency of occurrence) of saying things in a particular situation, e.g., maintain that, is found to. In this regard, I relied on IDIOM Search (Colson, 2016a), an online tool for the extraction of multi-word phrases, ranging from bigrams to sevengrams (see Colson, 2016b for the algorithm of and improvements made by this tool in corpus-based computational phraseology). Apart from genre-specific formulas, it can also capture what would be regarded as collocations (e.g., very often, equally important) on the basis of frequency of co-occurrence. Incomplete grammatical structures identified by IDIOM Search without a cohesive meaning or function (e.g., to be the, that we can, although it is) were not considered in the present study.
4. Results and discussion 4.1. An overall picture Altogether, 2,432 FSs were identified with an interpersonal function in the two corpora. Table 2 presents both raw and normalised frequencies (per 10,000 words) of the four major categories of interpersonal function across the two corpora. The log-likelihood test was used throughout the study to calculate whether a difference between two raw frequency counts is due to chance or to a statistically significant difference between the two corpora. In terms of overall token frequency (all four categories included), FSs with interpersonal functions are slightly more frequent in the novice corpus than in the expert corpus, and the G2 value is very close to the significance threshold (3.8) at the level of p < .05. In terms of type frequency, 714 different FSs were identified from the novice corpus, in comparison with 688 from the expert corpus; the difference between the two corpora is significant at the level of p < .01. Among them, 80% (566 out of 714) in the novice corpus and 73% (494 out of 688) in the expert corpus occur only once. The percentages corroborate that of Schneider et al. (2014) (82%) to a great extent. The most frequent FSs (see Table 3), 14 types in the novice corpus vs. 13 in the expert corpus) account for 18% of the total number of FSs in the novice corpus and 15% in the expert corpus. Table 3 shows that four FSs are shared by the two corpora (those in boldface). Among the FSs that occur exclusively on one of the lists, some belong to those identified by Martinez and Schmitt (2012) as commonly used phrases in three categories of genre (spoken general, written general, and written academic). The novice corpus contains more FSs that are commonly used in the spoken general genre (e.g., have to, in fact), while the expert corpus is marked by those that occur more often in the written genres (e.g., according to, seek to). In other words, one can still detect traces of orality, which is viewed as a feature of novice writing (see Wang, 2016, pp. 37e38 for an overview), in this type of writing (disciplinary essay) at this stage (during the last year of undergraduate studies or in a Masters program) with this type of students (native speakers of English). The following three subsections (4.2e4.4) look at the functions of Commitment, Evaluation and Modality and their FS realisations in more detail. The difference in Engagement between the two corpora was mainly caused by the three Maths texts involved (12 out of 17 in the novice corpus, and 26 out of 42 in the expert corpus), and therefore will not be considered in the following analysis.
Table 2 Frequencies (token/type) of interpersonal functions across corpora (normalised frequencies per 10,000 words in brackets). Corpus
Commitment
Evaluation
Modality
Engagement
Total
Novice Corpus
420/213 (90) 427/174 (81)
327/287 (70) 365/284 (69)
425/223 (91) 410/229 (78)
17/9 (4) 42/13 (8)
1189/714 (254) 1243/688 (236)
Expert Corpus
In terms of token frequency Total: G2 ¼ 3.31, p > .05; Commitment: G2 ¼ 2.22, p > .05; Evaluation: G2 ¼ 0.01, p > .05; Modality: G2 ¼ 5.01, p < .05; Engagement: G2 ¼ 8.17, p < .01 In terms of type frequency Total: G2 ¼ 8.54, p < .01; Commitment: G2 ¼ 9.95, p < .01; Evaluation: G2 ¼ 2.39, p > .05; Modality: G2 ¼ 0.96, p > .05; Engagement: G2 ¼ 0.33, p > .05
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
17
Table 3 Most frequent FSs in the two corpora (raw frequency >¼ 9 in the novice corpus and >¼ 10 in the expert corpus, corresponding to the same normalised frequency of 19/10,000 words). Novice Corpus
Expert Corpus
show that (42) seem to (23) be able to (17) have to (16) see that (16) suggest that (16) in fact (15) believe that (14) need to (14) argue that (12) try to (12) it be possible to (11) wish to (11) be found to (9)
show that (28) treat X as (25) according to (18) need to (15) observe that (14) maintain that (13) note that (12) suggest that (12) conclude that (11) of course (11) seek to (11) be able to (10) demonstrate that (10)
4.2. Commitment As shown in Table 1, there are more FS types associated with Commitment in the novice corpus than in the expert corpus (213 vs. 174; G2 ¼ 9.95, p < .01). Table 4 lists the most frequent FSs of this function in each corpus, with a cut-off point at the frequency of 13/10,000 words (raw frequency >¼ 6 in the novice corpus and >¼ 7 in the expert corpus). For some FSs, if the key word occurs in some other patterns, which were identified as FSs, the alternative patterns are also given in the table (after the ‘þ’ symbol). A small number of items in Table 4 represent idiosyncratic usage, and will be disregarded in the following discussion: those involve objection, treat X as, and be treated as in the expert corpus, and be found to and see that in the novice corpus. On the surface, the fact that the novice writers produced a wider range of FSs for this function seems to be contradictory to what has been found in most frequency-based studies comparing expert or native-speaker and student (mostly L2) writing, namely that experts or native speakers have a vast repertoire of FSs in terms of both frequency and variety, while students tend to overuse a limited selection. This is because the methodology of frequency-based research determines that word sequences below a certain frequency threshold are discarded from the analysis. As can be seen in Table 4, if a frequency threshold is set, the list of relatively frequent FSs is indeed longer in the expert corpus, which is consistent with the main finding in the field. One of the advantages of a manual approach is that the human analyst is better than the computer at identifying language € irregularities as well as underlying similarities underneath surface variations in form. Through a careful manual analysis, Adel (2014), for instance, finds that advanced L2 learners display a broader rhetorical repertoire for the anticipatory it pattern than native speakers. However, the ‘greater range’ is contributed much by infelicitous uses, indicating semantic/pragmatic € broadening (or overgeneralisation), which is ‘a basic mechanism observed in language learning’ (Adel, 2014, p. 78). The novice writers involved in the present study are native speakers, for whom FSs are supposed to be acquired whole as unanalysed
Table 4 Most frequent FSs of Commitment (raw frequency in brackets). Expert Corpus
Novice Corpus
show that (28) þ be shown to (3) treat X as (25) þ be treated as (3) according to (18) maintain that (13) suggest that (12) þ suggestion that (1) þ as X suggest (1) note that (12) conclude that (11) þ conclusion that (4) demonstrate that (10) believe that (9) þ believe X to be (2) say that (9) þ be said to (2) see X as (9) þ be seen as (4) see that (6)þ be seen to (1) objection to (9) þ object to/against (2) þ raise objection against/to (4) indicate that (8) argue that (7) þ as X argue (7)
show that (42) suggest that (16) þ suggestion that (3) þ be suggested to (3) þ as X suggest (1) see that (16) þ be seen to (4) see X as (6) þ be seen as (8) believe that (14) þ belief that (3) þ belief in (6) þ believe in (2) argue that (12) be found to (9) assume that (7) þ assume X to be (1) be thought to (6) þ think of X as (6) þ think that (2)
18
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
entities from an early age (Wray, 2002, 2008). We will find out below what may have contributed to their apparently larger FS repertoire associated with the function of expressing commitment to a proposition, as compared to the expert writers. As shown in Table 4, some expressions with the same key verb seem to have preferred syntactic structure(s). For instance, show tends to occur with a that-clause instead of the to-infinitive be shown to, whereas see X as and its passive form be seen as seem to be competitive with each other. The two corpora appear to converge with regard to these two examples, but a clear divergence can be seen in some other cases. Take argue for instance, the expert corpus yielded some variation between argue that and as X argue; the students, in contrast, showed a strong preference for the former. Taken as a whole, the experts used fewer alternative structures. This may be interpreted as that experts have a set of more or less established expressions for this function, which in turn supports the view that academic writing is marked by fomulaicity and fixedness of usage patterns rez-Llantada, 2014, p. 87). Meanwhile, although FL in general is seen as a kind of innate skill of native speakers, for(Pe mulaicity in disciplinary practices is more likely to be a skill that is acquired incrementally through formal instruction and rezextensive reading and writing practices, and is thus associated with expert, rather than novice, writing production (Pe Llantada, 2014, p. 85). The seemingly great variability in the novice corpus can be taken as a sign that the novice writers were yet to be socialised into the specific community of discourse. Another noticeable divergence between the two corpora lies in the fact that the verbs in the novice corpus are predominantly mental/cognitive (aka speculative) verbs such as see, believe, assume, and think, whereas the expert counterpart is dominated by reporting verbs (e.g., maintain, say) and what Hyland (2017) calls ‘data-supported’ verbs (e.g., conclude, demonstrate, indicate). Staples, Egbert, Biber, and Gray (2016), who base their investigation of L1 academic writing development on the BAWE corpus, argue forcefully for the importance of the variables of genre and register in accounting for the observed differences in student writing across university levels. The same can be said of the two sets of verbal phrases mentioned above. The novices' choices seem to suggest a more interpretative and less empiricist genre as well as the influence of spoken register, while the experts' selection of verbal phrases flags a more objective, impersonal genre of scientific writing. This point will be returned to in the following discussion with more examples. As mentioned, when pooling all the identified FSs together regardless of their frequencies, some usage patterns can be distinguished. Table 5 presents the most common patterns found in the data. As can be seen in Table 5, apart from the anticipatory it pattern, where the two corpora show a similar tendency, there is a significant divergence in the other two patterns. The usage pattern as X verb as exemplified by as X argue is more common in the expert corpus, while the novice writers seemed to be particularly fond of the passive structure in general. Such observations may serve as starting points for further investigation with more data in future research. The novice writers’ overuse of FSs to project a proposition is particularly evident in History (next only to Philosophy), a discipline in which the expert writers seemed to be most reluctant to employ FSs with this function. What follows is a close look at some examples taken from two corresponding History texts in the corpora in an attempt to shed light on possible explanations for the observed divergence between novice and expert writing. (1) It seems to me that if Winstanley believes in a Creator God, it would necessarily imply that God as Creator of the world is able to transcend His creation; though this does not preclude His immanence, nor the equation of Reason with God, as Hill seems to suggest when he said “only if we forget that the Father is Reason and that […] can we slip into thinking of an external God.” (BAWE_0019e) (2) It was therefore thought that Winstanley, following the death of his first wife and the presumed transference of the Ham property to her surviving sister Sarah King under the terms of William King's 1664 will, left Cobham around 1665 to reestablish himself in commerce near London.4 (0019e_expert) (3) It has been suggested that Winstanley was not a Quaker: …. 17 (0019e_expert) Although both texts are on the topic of Winstanley and his beliefs, the student essay centres on his/her own interpretation of a few scholars' take on Winstanley's beliefs, from which a coherent argument is constructed. The need to display their knowledge about the literature in this task may have prompted the frequent use of FSs such as it seems to X that and as X
Table 5 The most common FS patterns associated with commitment. Usage pattern
‘as X verb’ ‘be eed as/to/by/in’
‘it be eed that’
Examples
as X argue/explain/suggest be interpreted/seen/considered as be shown/believed to be confirmed/judged by be revealed in it be argued/shown/suggested that
‘as X verb’: G2 ¼ 4.76, p < .05. ‘be eed to/as/by/in’: G2 ¼ 7.18, p < .01. ‘it be eed that’: G2 ¼ 0.25, p > .05.
Expert Corpus
Novice Corpus
Type
Token
Type
Token
6 21
14 41
4 29
4 62
7
12
7
13
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
19
suggests in Example (1) to explicitly state to whom the proposition in question is credited. The expert essay focuses more on Winstanley's life experience that accounts for his change of beliefs. The work also draws heavily on previous research, but often in a way where the source is not immediately visible, as in Example (4). (4) Winstanley's association with James Sutton, in particular, is perhaps surprising since a member of this family, Thomas Sutton, impropriator of Cobham, had in 1650 been singled out by Winstanley as one of Parson John Platt's leading accomplices in the harassing of the Digger community. 16 (0019e_expert) Example (4) represents a common citation pattern in the two expert essays, where the projector (i.e., the author of the source) is absent, with only a number given at the end of the sentence indicating the source. Examples (1) to (3) illustrate another two citation styles that are distinctive of the novice and published texts, respectively. The novice writer in Example (1) was very clear to whom the expressed opinion is credited (as Hill seems to suggest). The expert writer in Examples (2) and (3) chose the impersonal, anticipatory it structure (it was therefore thought that, it has been suggested that) instead, together with a number given at the end of the sentence indicating the source, as in Example (4). The authorial presence of external voices is more visible in the pattern as X suggest than in it is suggested that, but according to Muguiro (2017), often less credit is given to the source in the former as the purpose of the citation may be to enter into a debate with the previous author. This is indeed the case with Example (1), where two opposing views (whether or not to preclude the equation of Reason with God/Father) are presented, one belonging to the writer and the other to the previous author, and the credit given to the previous author is further reduced by the uncertainty expression seem to. The use of the latter pattern, in contrast, gives more credit to the source as the intention is to build on previous researchers' findings rather than to challenge them. Muguiro’s (2017) study also shows that citation patterns differ across disciplines. Therefore the expert writer's preference for the anticipatory it pattern here may well be a standard way of referring to previous research typical of the History discipline, which the novice writer was yet to acquire. Given the small size of data involved, however, any point made about disciplinary variation should be seen as no more than tentative and preliminary suggestions that require validation with more data in future research.
4.3. Evaluation The Evaluative FSs were further classified into a few subcategories (see Section 3.2). Table 6 presents the distribution of the main subcategories occurring in the data. Note that the remaining subcategories are subsumed under the ‘Other’ category not only because of their relatively low frequencies, but also because they are more or less evenly distributed across the two corpora.
Table 6 Distribution of the main types of evaluative FSs in the corpora. Type Significance
Desirability (positive)
Desirability (negative)
Justification
Examples it is important to, contribute to, bring to the foreground, ground breaking, shed light on it is useful to, carefully balanced, the validity of, perfectly suited is problematic for, a fundamental misunderstanding of, is limited by, suffer from, the trouble with it is reasonable to, it is not self-evident that, no indication of, give clear support to
Other Significance: G2 ¼ 0.95, p > .05. Desirability (positive): G2 ¼ 0.88, p > .05. Desirability (negative): G2 ¼ 2.38, p > .05. Justification: G2 ¼ 16.57, p < .0001. Other: G2 ¼ 0.91, p > .05.
Expert Corpus
Novice Corpus
Freq./10,000 words 24
Prop. 35%
Freq./10,000 words 21
Prop. 30%
12
18%
14
20%
14
21%
18
27%
6
8%
1
2%
13
18%
15
21%
20
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
Table 6 reveals that most FSs of Evaluation in both corpora were employed to express attitude towards the importance of the message or event in question. The only significant difference between the two corpora was found in the subcategory of FSs expressing the writer's logical judgment or justification of the proposition that follows (e.g., it is irrational to, it is not selfevident that, no indication of, give support to), which the experts were more likely to employ. Table 7 lists the most frequent (raw frequency >¼ 6 in the expert corpus and >¼5 in the novice corpus) key words involved in the FSs of Evaluation in the two corpora. Table 7 shows that FSs of this category were indeed mostly used in both corpora to express the importance of the issue in question, with a range of different forms. Of the same subcategory are those involving the key word role, but this time there seems to be a rather fixed usage pattern, namely play a (X) role (in). In addition, we can see that FSs containing some keywords (e.g., difficulty, significant) are more varied in terms of syntactic and semantic patterns than those with the others (e.g., problem, implication, sufficient, contribute). Many FSs occur only once or twice, but together they form a substantial set of forms associated with a particular function. Another similarity between the two corpora rests on comments on the scope of something under discussion, although with different wording: the experts tended to opt for the FS (it) be sufficient to, while the novices seemed to prefer the less formal alternative be (not) enough to, another example indicative of the ‘oral’ style of novice writing. Apart from these few sets of FSs in which the two corpora converge to a certain extent, some FSs that stand out in one or the other corpus, again, point to genre differences (cf. Section 4.2): those involving the key words difficulty and problem in the novice corpus were most probably prompted when presenting counterarguments, while FSs containing significant, contribution, and implication in the expert corpus were typically used to comment on scientific findings. This time, divergence between the two corpora was mainly seen in the Physics essays, with novice writers being more likely to employ Evaluative FSs. Examples (5) to (7) were taken from two Physics texts on Quantum Mechanics in the corpora. (5) Newtonian Mechanics has proved to be a very effective tool for approximating the non-relativistic motion of a large object through gravitational interaction with its surroundings. […] In many situations however, a Newtonian approximation is simply not good enough. (BAWE_6097j) (6) The attempt to fuse General Relativity and Quantum Mechanics has posed great difficulties in that each theory is confined to its own framework needed to explain certain phenomena, but contradicts the basis of the other as it does so. (BAWE_6097j) (7) The above formalism also begs the question as to why L2 functions are needed. (6097j_expert) (8) In practice, both equations can be seen as useful depending on the circumstances. When the dynamical system corresponds to motion on geodesics then [equation] and has no role to play. (6097j_expert) The effect of genre is shown more clearly in these examples in accounting for the difference between novice and expert writing. As mentioned in Section 3.1, the genre of the Physics essays is very different between novice and expert writing. While the former is argumentative in nature, the latter presents original research, where a considerable proportion of the text is devoted to objective elucidation of equations. Indeed, as Examples (5) and (6) demonstrate, the student writers in Physics employed quite a number of evaluative FSs, both positive and negative ones. In Example (5), such FSs serve to bring out the two sides of the ‘tool’ by first acknowledging the contribution it has made, followed by a transition to the ‘not good enough’
Table 7 Most frequent FSs of Evaluation in the corpora (raw frequency in brackets). Expert Corpus
Novice Corpus
Key words
Examples
Key words
Examples
important (14) þ importance (11) þ importantly (2)
(it) be (X) important to, most important(ly), of X importance, give X importance to play a X role (in)
important (15) þ importance (10) þ importantly (1)
(it) be (X), important (not) to, importance of, an important area
difficult (11) þ difficulty (6)
(it) (X) be difficult to, make it (X) difficult for/to, (the) difficulty(ies) in/of/with, pose þ difficulty interesting discussion/results, very interesting, it is interesting to
role (13)
support (11)
significant (8) þ significance (3) contribute (8) þ contribution (2)
give X support to, lend (X) support to, (provide) X support for, support þ conclusion highly significant, the significance of contribute (X) to, contribute (X) in
role (9)
(play) a (X) role in
useful (9) þ usefulness (1)
implication (7) sufficient (6)
(have) X implications (for) (it) be sufficient to
enough (7) problem (5)
a useful tool for, it is very useful to, very useful in be (not) enough to pose þ problem for/to
interesting (9)
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
21
aspect, which would naturally lead to an explanation of why and eventually a solution, representing a typical argumentative structure. While the same strategy can also be discerned in the expert texts, it is presented in a more matter-of-fact and less ‘emotional’ manner, as can be seen in Examples (7) and (8). More examples of such FSs occurring in the expert texts include a new insight into, bring to the foreground, is limited by, it is sufficient to, the importance/question/significance of, which occur without intensifiers or intensifier-like items, unlike those occurring frequently in the student texts (e.g., very effective, very elegant, fascinating results, pose great difficulties).
4.4. Modality Table 8 presents the main types of Modality and the distribution of associated FSs in the corpora. Those categories that are not frequent in both corpora (Capability, Intensity, Usuality, Verification, and Degree) are grouped together under the ‘Other’ category. The two corpora yielded similar proportions of FSs expressing (Un)certainty, Necessity/obligation, and Validation, while the main divergence between the two corpora lies in between Probability and Desire. Table 9 lists the most frequent FSs associated with these two functions in each corpus. As can be seen, the Probability expressions prevalent in the expert corpus, much like the subcategory of FSs expressing the writer's logical judgment or justification (see Section 4.3), were likely to be driven by a need to evaluate new knowledge claims, whereas the Desire expressions in the novice corpus were used metadiscursively to comment on their own texts (e.g., informing the reader of the aim of a research procedure or part of the text). Such seemingly different preferences, once again, were most likely to be governed by two distinct genres, one of which requires scientific objectivity while a personal voice is more common in the other. Many of the Desire expressions (e.g., try to, wish to, want to, hope to) are also typical of everyday language, as in Example (9), which illustrates well the colloquial style of novice writing with the use of simple, informal words and expressions. (9) We will be forced to introduce some new methods as well, but will try to approach the theorem without any major headaches. (BAWE_0049b) We see also in Table 9 some FSs with their distinct usage patterns, particularly in the expert corpus. For instance, both likely and possible in expert writing seem to prefer the infinitive structure be (un)likely to (rather than the anticipatory it structure it is (un)likely that) and it is (im)possible to (rather than it is (im)possible that). Moreover, the former almost always occurs with a modifier such as more and less (9 out of 11 instances) whereas the latter occurs with no modifier and almost always in the simple present tense in the expert corpus: (10) …, these features are far more likely to be the result of a Designer who intended to bring about this result. (3019 g_expert) (11) By locating the recurrent patterns of interaction through which the discourse is realised, it is possible to observe where and how the discourse is challenged. (6009b_expert) Such preferences are less evident in the novice corpus. Take, for instance, it is possible to. The verb forms used by the novices include was, would be, and will become, apart from the dominant present tense is, providing further support for the view that formulaicity in disciplinary writing is not an innate skill for native speakers, but something to be learned.
Table 8 Distribution of the main subcategories of Modality realised by FSs in the corpora. Expert Corpus
(Un)certainty Necessity/obligation Probability Desire Validation Other (Un)certainty: G2 ¼ 0.00, p > .05. Necessity/obligation: G2 ¼ 3.28, p > .05. Probability: G2 ¼ 6.79, p < .01. Desire: G2 ¼ 16.13, p < .0001. Validation: G2 ¼ 3.28, p > .05. Other: G2 ¼ 7.04, p < .01.
Novice Corpus
Freq./10,000 words
%
Freq./10,000 words
%
12 12 17 11 8 18
15% 15% 22% 14% 11% 23%
12 16 11 21 5 25
13% 17% 12% 24% 6% 28%
22
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
Table 9 Frequent FSs of Probability and Desire in the corpora (raw frequency in brackets). Expert Corpus
Novice Corpus
it be/become (im)possible (for sb) to (15), Probability be (X) (un)likely to (11) þ it is likely that (1), (the) possibility of (5) þ the possibility that (1), sb's/the chances of (7), it is (im)possible to (5) þ it is possible that (2) þ make (it) (im)possible (to) be likely to (2) þ more likely (2) þ very unlikely (1), not necessarily (3) (3), the possibility of (5) þ the possibility that (3), the (X) (im)probability of (5) þ it is X (im)probable that (4), most likely (2), tend to (5) þ tendency to (2), not necessarily (4), potential of/for (4) Desire seek to (11), try to (12), intend to (4) þ be intended to (3), wish to (11), try to (6), want to (8), attempts at (4) þ (in) (an) attempt to (4) þ attempt to (2), (in) (X) attempts to (6) þ (in) (a) (X) attempt to (5) þ attempt to (3), aim to (2) decide to (6), aim to (4) þ the aim to/of (3), hope to (2) þ hope of (1) þ hope that (1)
5. Conclusion Through careful manual identification and annotation of FSs in context, the present study was able to include what would normally be discarded in frequency-based studies of FL (e.g., less frequent and discontinuous FSs and those allow formal variations), and thus provided additional insights into the use of FSs that distinguishes L1 novice and expert writing. To start with, contrary to what has been attested in most frequency-based studies, the novice writers actually produced a wider range of FSs, in particular those involving interpersonal projection (Commitment), than did the expert writers. A large number of the identified FSs occurred only sporadically in the data. Taken together, however, these seemingly idiosyncratic choices revealed important formal, semantic, and functional features characteristic of a given community. One of the features concerned formal variability of FSs, which was in general of a lower degree in the expert corpus, indicating that formulaicity is a feature of successful professional writing, and the frequent use of a set of forms by the expert writers identified them as members of the discourse community. The novice writers, in contrast, employed more readily alternative forms, having not yet been socialised into the community. The present study thus provided further empirical support for the idea that formulaicity in disciplinary writing is no more an innate skill for the L1 writer than it is for the L2 writer, but must be learned by all disciplinary novices. Another main difference between L1 novice and expert writing found in the present study was related to the writers’ preference for FSs with different functions (e.g., Probability and Justification in expert writing vs. Commitment and Desire in novice writing), possibly in response to different communicative needs, which in turn reflected genre differences, sometimes in interaction with other variables such as register and disciplinary conventions. While the analysis based on the examples taken from the data provided a highly informative account of their potential effects on novice and expert writing, these variables can benefit from more rigorous testing in future research into the nature of formulaicity in academic discourse. The research findings will then have even more significant implications for EAP pedagogy in the sense of detecting and subsequently attending to genre- and/or discipline-specific conventions, of which the novice writers in the present study clearly needed more training. Acknowledgements I would like to thank the journal's anonymous reviewers for their detailed and constructive comments on an earlier draft the article. This project is funded by the Swedish Research Council (437-2014-6696). References € Adel, A. (2014). Selecting quantitative data for qualitative analysis: A case study connecting a lexicogrammatical pattern to rhetorical moves. Journal of English for Academic Purposes, 16, 68e80. € Adel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31(2), 81e92. Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam; Philadelphia: John Benjamins. Biber, D. (2009). A corpus-driven approach to formulaic language in English. Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3), 275e311. Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371e405. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow: Pearson. Buerki, A. (2016). Formulaic sequences: A drop in the ocean of constructions or something more significant? European Journal of English Studies, 20(1), 15e34.
Y. Wang / Journal of English for Academic Purposes 33 (2018) 12e23
23
Chen, Y., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language, Learning and Technology, 14(2), 30e49. Colson, J. (2016a). IDIOM Search. http://idiomsearch.lsti.ucl.ac.be/index.html. Colson, J. (2016b). Set Phrases around GLOBALIZATION: An experiment in corpus-based computational phraseology. In F. A. Almeida, I. O. Barrera, E. Q. Toledo, & M. E. S. Cuervo (Eds.), Input a word, analyze the world. Selected approaches to corpus linguistics (pp. 141e152). Newcastle: Cambridge Scholars Publishing. Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23, 397e423. Durrant, P., & Mathews-Aydınlı, J. (2011). A function-first approach to identifying formulaic language in academic writing. English for Specific Purposes, 30, 58e72. Ebeling, S. O., & Hasselgård, H. (2015). Learners' and native speakers' use of recurrent word-combinations across disciplines. Bergen Language and Linguistics Studies (BeLLS), 6, 87e106. Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 29e62. Halliday, M. A. K., & revised by C.M.I.M. Matthiessen. (2014). Halliday's introduction to functional Grammar (4th ed.). Oxen: Routledge. Hyland, K. (2004). Disciplinary interactions: Metadiscourse in L2 postgraduate writing. Journal of Second Language Writing, 13, 133e151. Hyland, K. (2008). Academic clusters: Text patterning in published and postgraduate writing. International Journal of Applied Linguistics, 18(1), 41e62. Hyland, K. (2012). Disciplinary identities: Individuality and community in academic discourse. Cambridge: Cambridge University Press. Hyland, K. (2017). Academic interaction: Where's it all going?, Plenary speech presented at Faces of English 2 Conference (pp. 1e3). University of Hong Kong. June 2017. Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33(3), 299e320. Muguiro, N. (2017). Citing external sources in educational neuroscience articles: In search of an interdisciplinary stance and voice, Paper presented at the 9th International Corpus Linguistics Conference (pp. 24e28). Birmingham University. July 2017. Nesi, H., & Gardner, S. (2012). Genres across the disciplines: Student writing in higher education. Cambridge: Cambridge University Press. O'Donnell, M. (2013). UAM corpus tool. Version 3.0. rez-Llantada, C. (2014). Formulaic language in L1 and L2 expert academic writing: Convergent and divergent usage. Journal of English for Academic Pe Purposes, 14, 84e94. Salazar, D. (2014). Lexical bundles in native and non-native scientific writing: Applying a corpus-based study to language teaching. Amsterdam; Philadelphia: John Benjamins. Schneider, N., Onuffer, S., Kazour, N., Danchik, E., Mordowanec, M. T., Conrad, H., et al. (2014). Comprehensive annotation of multiword expressions in a social web corpus, Proceedings of the 9th language resources and evaluation conference (pp. 455e461). Reykjavík, Iceland: ELRA. Staples, S., Egbert, J., Biber, D., & Gray, B. (2016). Development at the university level: Phrasal and clausal complexity across level of study, discipline, and genre. Written Communication, 33(2), 149e183. Stillar, A. (1998). Analyzing everyday texts: Discourse, rhetoric, and social perspectives. California; London; New Delhi: SAGE. Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Swales, J. (2004). Research genres: Exploration and applications. Cambridge: Cambridge University Press. Wang, Y. (2016). The idiom principle and L1 influence: A contrastive learner-corpus study of delexical verb þ noun collocations. Amsterdam; Philadelphia: John Benjamins. Wang, Y. (2017a). Lexical bundles in spoken academic ELF: Genre and disciplinary variation. International Journal of Corpus Linguistics, 22(2), 187e211. Wang, Y. (2017b). Lexical bundles in news discourse 1784e1983. In M. Palander-Collin, M. Ratia, & I. Taavitsainen (Eds.), Diachronic developments in English news discourse (pp. 97e116). Amsterdam; Philadelphia: John Benjamins. Wray, A. (1999). Formulaic language in learners and native speakers. Language Teaching, 32(4), 213e231. Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press. Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press. Wray, A. (2009). Identifying formulaic language: Persistent challenges and new opportunities. In R. Corrigan, E. A. Moravcsik, H. Ouali, & K. M. Wheatley (Eds.), Formulaic language. Volume 1. Distribution and historical change (pp. 27e51). Amsterdam; Philadelphia: John Benjamins. Wray, A., & Perkins, M. R. (2000). The functions of formulaic language: An integrated model. Language & Communication, 20(1), 1e28. The author received her Ph.D. in English Linguistics from Uppsala University in 2013, with a thesis on delexical verb þ noun collocations in learner English. She is currently a postdoctoral fellow working at both Uppsala University and Cardiff University on the use of formulaic language in academic discourse, a three-year project funded by the Swedish Research Council. Her main research interests include corpus linguistics, formulaic language, EAP, and SLA.