The rhetorical functions of syntactically complex sentences in social science research article introductions

The rhetorical functions of syntactically complex sentences in social science research article introductions

Journal of English for Academic Purposes 44 (2020) 100832 Contents lists available at ScienceDirect Journal of English for Academic Purposes journal...

2MB Sizes 1 Downloads 78 Views

Journal of English for Academic Purposes 44 (2020) 100832

Contents lists available at ScienceDirect

Journal of English for Academic Purposes journal homepage: www.elsevier.com/locate/jeap

The rhetorical functions of syntactically complex sentences in social science research article introductions Xiaofei Lu*, J. Elliott Casal, Yingying Liu Department of Applied Linguistics, The Pennsylvania State University, 234 Sparks Building, University Park, PA, 16802, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 27 September 2019 Received in revised form 30 November 2019 Accepted 20 December 2019 Available online xxx

There have been increasing calls for research attention to the linguistic realizations of rhetorical functions in academic writing. Research in this area has so far focused primarily on lexical and phraseological features. While numerous studies have investigated the relationship of syntactic complexity to language proficiency, development, and writing quality, research examining the rhetorical functions of complex syntactic structures is scant. This study analyzes the rhetorical functions of syntactically complex sentences in the Corpus of Social Science Research Article Introductions, which contains the introduction sections of 600 published research articles in six social science disciplines. All samples were annotated for rhetorical moves and steps using an adapted version of Swales’ (2004) revised Create a Research Space model, and all sentences were assessed for syntactic complexity using multiple measures of global complexity, finite subordination, clausal elaboration, and phrasal complexity. Results revealed significant variation in syntactic complexity among sentences realizing different rhetorical functions and expert writers’ employment of complex structures to realize different rhetorical goals. The implications of our findings for academic writing research, pedagogy and assessment are discussed. © 2019 Elsevier Ltd. All rights reserved.

Keywords: Academic writing Genre analysis Syntactic complexity

1. Introduction In recent decades, English for Academic Purposes (EAP) researchers have expanded explorations of the rhetorical structures of academic texts to include various corpus methodologies (Flowerdew, 2005). An emerging trend in this research is an emphasis on identifying the linguistic forms that expert writers use to accomplish diverse rhetorical goals in published research articles (RAs) (Cortes, 2013; Durrant & Mathews-Aydınlı, 2011; Le & Harrington, 2015; Omidian, Shahriari, & Siyanova-Chanturia, 2018). This trend is much welcomed, as systematic characterizations of the linguistic features associated with particular communicative functions may potentially deepen our understanding of disciplinary genre practices and provide insights and resources for EAP writing pedagogy, addressing what has recently been labeled the “function-form gap” (Moreno & Swales, 2018, p. 3). However, little of such research exists, and the primary emphasis of extant corpus-based EAP scholarship has been at lexical and phraseological levels, rather than at the syntactic level, despite the important role complex syntactic structures play in writing quality overall (Lu, 2017) and in the writing of academic specialists in particular (Biber &

* Corresponding author. E-mail addresses: [email protected] (X. Lu), [email protected] (J.E. Casal), [email protected] (Y. Liu). https://doi.org/10.1016/j.jeap.2019.100832 1475-1585/© 2019 Elsevier Ltd. All rights reserved.

2

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Gray, 2010). With the exception of Ryshina-Pankova (2015), who examined the communicative functions afforded by nominalizations, function-oriented approaches to the examination of syntactic complexity in academic texts are extremely limited. Nevertheless, an understanding of the functional affordances of complex syntactic structures common in academic RA writing has substantial pedagogical value. The present study aims to address this gap by exploring how disciplinary writers across social science disciplines employ complex syntactic structures to achieve their rhetorical goals. Specifically, it examines how five key measures of syntactic complexity map to the rhetorical moves and steps (Swales, 1990, 2004) of introduction sections of published RAs across six major social science disciplines.

1.1. The need for a functional turn in syntactic complexity research  & Housen, 2014), syntactic complexity is An important component of the larger construct of linguistic complexity (Bulte construed as the degree of variation, sophistication, and elaboration of the syntactic structures used in language production (Housen & Kuiken, 2009; Lu, 2017; Ortega, 2003). While earlier syntactic complexity studies often employed a small number of indices for syntactic complexity measurement (Lu, 2011; Ortega, 2003; Wolfe-Quintero, Inagaki, & Kim, 1998), the consensus is now that it should be conceptualized as a multi-dimensional construct and measured with a set of indices that  & Housen, 2014; Norris & Ortega, 2009). The multidimensional conceptualization gauge its different dimensions (Bulte proposed by Norris and Ortega (2009), for example, postulates that syntactic complexity should be measured using indices of global complexity, coordination, subordination, as well as clausal or phrasal elaboration. The conceptual development in syntactic complexity research has been accompanied by instrumental development, i.e., the emergence of several computational tools designed to automate syntactic complexity measurement with a large repertoire of indices that capture different syntactic complexity dimensions. The Biber Tagger (Biber, Johansson, Leech, Conrad, & Finegan, 1999), for example, computes the normed frequencies of various types of dependent clauses (e.g., finite relative clauses), phrasal structures (e.g., noun þ of-phrase), and grammatical classes (e.g., adverbs), many of which have been shown to be useful measures of grammatical complexity for assessing speaking and writing production (Biber, Gray, & Staples, 2016). The L2 Syntactic Complexity Analyzer (L2SCA; Lu, 2010) integrates 14 syntactic complexity indices of five different types: overall sentence complexity (i.e., number of clauses per sentence), length of production unit (e.g., mean length of sentence), coordination (e.g., number of T-units per sentence), subordination (e.g., number of dependent clauses per T-unit), and phrasal sophistication (e.g., number of complex nominals per T-unit). Coh-Metrix (McNamara, Graesser, McCarthy, & Cai, 2014) contains several embeddedness (e.g., number of words before the main verb), syntax similarity (i.e., the proportion of intersecting nodes between the parse trees of two sentences), and edit distance (e.g., minimum number of changes necessary to make two sentences identical) measures as indices of syntactic complexity. Finally, the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC; Kyle, 2016) offers a large set of fine-grained indices of clausal complexity (e.g., number of adjective complements per clause) and phrasal complexity (e.g., number of dependents per object of the preposition), along with indices based on the frequency profiles of verb argument constructions (e.g., the average frequency of all verb argument constructions in a reference corpus). With the conceptual and instrumental development in syntactic complexity research, the past decade witnessed a surge of L2 writing studies examining the quantitative relationship of syntactic complexity to language development (e.g., Chan, Verspoor, & Vahtrick, 2015; Crossley & McNamara, 2014; Yoon & Polio, 2017), language proficiency (e.g.,  & Housen, 2014; Lu, 2011; Lu & Ai, 2015), and language production quality (e.g., Biber et al., 2016; Kyle & Bulte Crossley, 2018; Yang, Lu, & Weigle, 2015). These studies have yielded valuable insights into the patterns of syntactic complexity development, syntactic complexity measures that differentiate levels of language proficiency, and the extent to which syntactic complexity measures or their co-occurrence patterns correlate with human ratings of writing quality. An ostensible gap in current syntactic complexity research, however, lies in the lack of attention to the meaning and function dimensions of complexity. In a rare exception, Ryshina-Pankova (2015) examined the communicative functions afforded by nominalizations and showed that prioritization of the functional aspects of linguistic complexity is important to reveal the communicative demands behind the placement of complexity constructs. Ortega (2015) supported RyshinaPankova’s perspective and highlighted that this approach “requires that any language instantiation of lexicogrammatical resources be analyzed for genre and register demands” (p. 86). Indeed, while formal measures of syntactic complexity may mark progression toward higher writing proficiency, quality writing is characterized by genre-appropriate and functionally effective usage of complex syntactic structures, rather than the frequency of such structures alone. In fact, over-emphasis on formal syntactic complexity alone could lead to negative pedagogical consequences, as learners may be tempted to insert functionally inappropriate complex structures to increase the syntactic complexity of their writing. Corpus-based studies that systematically examine the rhetorical or communicative functions of complex syntactic structures in specific genres do not yet exist. Such studies, however, will not only contribute to deeper understandings of the functional affordances of syntactic complexity in specific academic genres, but also generate useful pedagogical resources in the form of repertoires of instantiations of complex structures aligned with their rhetorical or communicative functions. The current study sets out to fill this gap in syntactic complexity research and to advance our understanding of the function dimension of syntactic complexity.

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

3

1.2. The need for attention to form-function mappings in corpus-based genre analysis EAP scholars working within the English for Specific Purposes (ESP) school of Genre Analysis have profiled the rhetorical move structures of a variety of academic genres, such as RA introductions (e.g., Hirano, 2009; Samraj, 2002), conference proposals (e.g., Halleck & Connor, 2006), and professional documents (e.g., Bhatia, 2008). Such studies typically adopt a function-first discourse analytical framework, in which rhetorical moves and steps are coded manually so that the rhetorical move structures of a genre can be analyzed, and the analysis of linguistic forms is subordinate. Meanwhile, a growing number of EAP scholars have used corpus methodologies to profile linguistic features associated with genre practices of disciplinary writers (Flowerdew, 2005), such as phraseframes in mathematics RAs (Cunningham, 2017) and social science RA introductions (Lu, Yoon, & Kisselev, 2018), and lexical bundles in telecommunication RAs (Pan, Reppen, & Biber, 2016), psychology RAs (Esfandiari & Barbary, 2017), and doctoral dissertation abstracts (Lu & Deng, 2019). Most such corpus-based studies take a form-first approach, in which prominent forms are identified automatically and then interpreted in a largely decontextualized manner. Overall, EAP studies taking function-first or form-first approaches to the analysis of genre practices have yielded useful insights into the rhetorical and linguistic conventions of academic communities that have proven valuable for genre-based writing pedagogy. However, the primary focus on either rhetorical features or formal linguistic features in isolation falls short of capturing current conceptions of genre knowledge and development (Swales, 2019), which have moved beyond an emphasis on formal knowledge alone (Polio, 2017), with genre “expertise” now seen as the “confluence” of formal and rhetorical domains of knowledge (Tardy, 2009, p. 20). This research gap has led to a few integrated corpus and genre analytic investigations of the rhetorical and linguistic dimensions of text in tandem (e.g., Cortes, 2013; Durrant & Mathews-Aydınlı, 2011; Le & Harrington, 2015; Omidian et al., 2018). Cortes (2013) extracted a list of lexical bundles from a corpus of RA introductions and aligned them with different rhetorical moves and steps. Similarly, Omidian et al. (2018) identified a list of multi-word expressions from a corpus of RA abstracts and classified them according to their communicative functions in different moves. However, in both studies the researchers assigned rhetorical move tags to chunks containing a lexical bundle or multi-word expression with limited context. This practice may be problematic, as the determination of a rhetorical move or step of a chunk of a text may sometimes entail the examination of a much larger context. Durrant and Mathews-Aydınlı (2011) avoided this problem by first manually annotating a corpus of graduate student writing for rhetorical moves and steps and then identifying the formulaic forms associated with each rhetorical function. While Le and Harrington (2015) also first identified a list of word clusters from a corpus of the Discussion sections of applied linguistics RAs, they analyzed the discourse functions of each word cluster in much greater context. Although this emerging body of research has contributed to filling the “function-form gap” in the research of genre practices, it is clear that the focus has so far been on features at the lexical or phraseological levels. Given the importance of complex syntactic structures in writing quality in general (Lu, 2017) and in the writing of academic specialists in particular (Biber & Gray, 2010), systematic investigations of the rhetorical functions of complex syntactic structures in academic writing constitute a much-needed avenue for further research in this area.

1.3. Research questions The goal of the current study is to examine the rhetorical functions of complex syntactic structures in social science RA introductions. Specifically, we seek to answer the following two research questions. 1. Are there differences in the syntactic complexity of sentences that realize different rhetorical functions in social science research article introductions? If yes, what are the differences? 2. What rhetorical functions are realized through the most syntactically complex sentences in social science research article introductions? The first of these questions analyzes the extent to which the language which furthers the rhetorical functions identified in our move analysis differs in terms of five diverse measures of syntactic complexity. Such analysis may usefully extend research addressing the “function-form gap” (Moreno & Swales, 2018, p. 3) beyond lexical and phraseological perspectives into syntactic domains. The second of these questions analyzes the rhetorical functions of the most complex sentences in each measure in terms of internally derived thresholds and the amount of text dedicated to each rhetorical move. In addition to providing insights into what saliently complex sentences are constructed to achieve rhetorically, such analysis may also shed light on the potential affordances of such complex structures for social science research article writers by highlighting patterns of salient complexity obscured by means. This also has the pedagogical aims of both highlighting the rhetorical chunks that may be most productively utilized in teaching rhetorical and formal conventions of academic writing and providing a resource of complex sentences with rhetorical annotations for pedagogical activity.

4

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Table 1 Descriptive statistics for COSSRAI. Discipline

Texts

Total Words

Mean Words

Words SD

Word Range

Anthropology Applied Linguistics Economics Political Science Psychology Sociology Total

100 100 100 100 100 100 600

76,455 63,333 144,071 78,523 76,799 74,840 513,688

765 633 1441 785 768 748 856

394 413 427 309 503 306 476

132e1866 97e2565 414e2570 262e1771 104e2680 324e1931 97e2680

2. Method 2.1. Corpus design Our data consisted of the Corpus of Social Sciences Research Article Introductions (COSSRAI), a collection of 600 published RA introductions from six social science disciplines, with 100 each from Anthropology, Applied Linguistics, Economics, Political Science, Psychology, and Sociology. Within each discipline, five non-niche journals were selected based on impact factor and expert member checking, and four articles were selected from each journal for each year from 2012 to 2016, inclusive. Introduction boundaries were based primarily on heading labels and also on content when necessary. Introductions were saved in plain text files and cleaned of parenthetical components of citation, textual oddities that surfaced during conversion to plain text format, journal tags, and page numbers. Formulas, figures, and tables were also removed. As shown in Table 1, COSSRAI contains a total of 513,688 words (n ¼ 600; mean ¼ 856; SD ¼ 476).

2.2. Rhetorical move annotation The COSSRAI corpus was analyzed for rhetorical moves and steps (Swales, 1990, 2004) and annotated at sentence boundaries. Following the recommendation of Moreno and Swales (2018), we adopted the rhetorical chunk as the unit of rhetorical move analysis, proceeding from the step level up to the move level. This entailed a close analysis of linguistic, structural, and content oriented cues to shifts in the rhetorical aims of authors. To facilitate the evaluation of how complex syntactic structures are deployed in the realization of writers’ rhetorical goals, however, we adopted the sentence as the unit of annotation and attached a rhetorical step tag to each sentence. When rhetorical boundaries did not align with sentence boundaries, a primary rhetorical function was established if possible. Otherwise, sentences which appeared to be advancing multiple rhetorical aims received multiple codes which were later manually resolved in the mapping if possible. Rhetorical move annotation was conducted by a research team of seven writing specialists. The researchers began with Swales’ (2004) revised Creating a Research Space (CARS) model and collaboratively adapted the framework through iterative coding of 60 texts, drawn equally from the six included disciplines. The revised CARS model was expanded to account for observed interdisciplinary variation. For example, the first step of Swales’ Move 1 ‘Establishing a research territory’ was split into Step 1a ‘Claiming centrality or value of a research area’ and Step 1b ‘Real-world contextualization,’ to account for notable differences in how context was developed across disciplines. The resulting framework is included in Table 2 along with the number of sentences tagged with each rhetorical step and the number of texts containing each step. After the framework was developed, texts were split evenly between the seven researchers, with each text annotated by one researcher and checked by another researcher. Differences in annotation were resolved collaboratively by the entire research group, and the overall interrater reliability for rhetorical moves was high (Cohen’s Kappa .81).

2.3. Syntactic complexity indices Table 3 summarizes the five syntactic complexity measures considered in this study, with reference to their operationalizations in previous literature. They are not proposed as a final set of indices for future research, but were selected as a diverse range of indices suitable for our analytical goals based on several criteria. In addition to their widespread usage in previous syntactic complexity research, these measures cover most of the key dimensions proposed by Norris and Ortega (2009): sentence length and left-embeddedness capture global complexity, number of finite clauses captures finite subordination, number of nonfinite dependent clauses captures clausal elaboration, and nominalizations capture phrasal complexity (Biber et al., 2016). Additionally, the structures involved in them can be retrieved automatically with a high degree of reliability (Biber & Gray, 2010; Lu, 2010; McNamara et al., 2014), can be expected to occur adequately frequently in RAs, and can be easily interpreted by EAP writing teachers and learners. The samples in the corpus were part-of-speech (POS) tagged and syntactically parsed with the Stanford CoreNLP (Manning et al., 2014). A Python script was then written to analyze the tagged or parsed samples and calculate the five measures for each sentence in the corpus.

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

5

Table 2 Move-step framework for COSSRAI. Move/Step

Description

Move 1 M1_S1a M1_S1b M1_S2 M1_S3 Move 2 M2_S1a M2_S1b M2_S1c M2_S1d M2_S1e M2_S2 Move 3 M3_S1 M3_S2a M3_S2b M3_S3 M3_S4a M3_S4b M3_S4c M3_S5 M3_S6 M3_S7 M3_S8 M3_S9

Establishing a research territory Claiming centrality or value of research area Real-world contextualization Making generalizations about research area Reviewing items of previous research Establishing a niche Counter-claiming Indicating a gap Question raising Continuing a tradition Pointing out limitations of previous research Providing justification Presenting the present work via Announcing present research Presenting research questions or hypotheses Advancing new theoretical claims Definitional clarification Summarizing methods Explaining a mathematical model Describing analyzed scenario Announcing and discussing results Stating the value of present research Outlining the structure of the paper Rationalizing research focus and design Presenting limitations of current study

Sentences

Texts

631 (3.02%) 800 (3.83%) 5393 (25.83%) 1614 (7.73%)

309 128 568 338

218 670 328 121 367 434

80 (13.33%) 321 (53.5%) 170 (28.33%) 83 (13.83%) 117 (19.5%) 196 (32.67%)

(1.04%) (3.21%) (1.57%) (.58%) (1.76%) (2.08%)

1183 (5.67%) 507 (2.43%) 403 (1.93%) 266 (1.27%) 981 (4.7%) 571 (2.73%) 398 (1.91%) 2369 (11.35%) 950 (4.55%) 1401 (6.71%) 1228 (5.88%) 48 (.23%)

(51.5%) (21.33%) (94.67%) (56.33%)

562 (93.67%) 206 (34.33%) 95 (15.83%) 118 (19.67%) 118 (19.67%) 319 (53.17%) 72 (12%) 273 (45.5%) 233 (38.83%) 326 (54.33%) 295 (49.17%) 21 (3.5%)

Note: The sentences column indicates the number and percentage of sentences coded with the move-step tag in COSSRAI.

Sentence length and number of finite dependent clauses were calculated by adapting the code for the same purposes in L2SCA, which had reported F-scores of 1.0 and 0.925 for sentence and finite dependent clause identification (Lu, 2010).1 A list of candidate nominalizations was compiled by retrieving all nouns suffixed with -tion, -sion, -ity, -ment, or -ness, following Biber et al. (1999), from the POS-tagged corpus. Additionally, nouns appearing in Nomlex (Macleod et al., 2001), a 1025-word list of nominalizations occurring in the Brown and Wall Street Journal corpora, were extracted to improve coverage. The candidate list was manually examined by two researchers to rule out false positives (e.g., nation). All nouns on the finalized list were then identified in the corpus as nominalizations by the Python code. Biber and Gray (2010) reported a precision of .99 for nominalization identification using the set of suffixes coupled with manual checking. As we followed the same procedure but integrated Nomlex, our method can be expected to achieve a comparable precision with a higher coverage. Nonfinite dependent clause was a useful measure of syntactic complexity in written academic English (Biber et al., 2011). In the present study, nonfinite dependent clauses marked with gerund, infinitive, or past particles in each sentence were identified by adapting the code for verb phrase (VP) identification in L2SCA (Lu, 2010), which had a reported F-score of 0.926 for VP identification. Finally, for left-embeddedness, we followed McNamara et al.’s (2014) definition and adapted the relevant L2SCA code to identify the main verb of the main clause and count the number of words before that verb, complemented by manual checking of a small proportion (less than 1%) of uncaptured sentences. L2SCA’s F-scores for finite clause (0.961) and VP (0.926) identification can serve as good estimates of the upper and lower bounds of the F-score for the code’s main verb identification. 2.4. Analytical procedures To address the first research question, we combined the 20 texts from the same journal into one sample and calculated the mean for each measure across all sentences annotated with each step in each sample.2 The one-way multivariate analysis of variance (MANOVA) and one-way analysis of variance (ANOVA) were then used to determine whether significant differences

1 F-score is the harmonic mean of precision and recall and is calculated as (2  precision  recall)/(precision þ recall). Precision refers to the proportion of units identified that are accurate (e.g., the proportion of finite dependent clauses identified by L2SCA that are actually finite dependent clauses). Recall refers to the proportion of all relevant units that are accurately identified (e.g., the proportion of finite dependent clauses in the corpus that are accurately identified by L2SCA). An F-score value is between 0 and 1, with higher values corresponding to better performance. While there is no pre-defined acceptable threshold for F-scores, a logical reference point in this context is the average F-score of the Stanford Parser, the parser used in L2SCA, which is 0.867 (Manning et al., 2014). 2 We combined the texts from the same journal into a single sample primarily to alleviate the issues of data sparseness and skewness, as not all rhetorical steps occurred in all individual texts and the distribution of rhetorical steps is uneven across individual texts.

6

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Table 3 Syntactic complexity indices used in this study. Index name

Index description

Sentence length Nominalization

Number of words in the sentence (Lu, 2010) Number of words with one of the five suffixes in Biber et al. (1999) or included in Nomlex (Macleod, Grishman, Meyers, Barrett, & Reeves, 2001) Finite dependent clause Number of finite dependent clauses, either nominal, adjective, or adverbial (Lu, 2010) Nonfinite dependent Number of nonfinite dependent clauses with gerund, infinitive, or past particles (Biber, Gray, & Poonpon, 2011) clause Left-embeddedness Number of words before the main verb of the sentence (McNamara et al., 2014)

existed in the syntactic complexity of sentences realizing different steps. Rather than reporting pairwise comparison results for the 22 steps, we employed the graphical analysis of means (ANOM) to identify and display steps that differ significantly from the group for each measure. To address the second research question, we first established complexity thresholds for all five measures with corpusinternal criteria and then extracted syntactically complex sentences from the corpus using those thresholds. We adopted the third quartile as the complexity threshold (see Table 4) as it resulted in a reasonable set of syntactically complex sentences for further analysis for each measure. For example, the third quartile of the sentence length measure was 33; therefore, all sentences with 33 or more words were considered complex for this measure. For each measure, we conducted a simple linear regression analysis on the number of complex sentences (dependent variable) and the total number of sentences (independent variable) across rhetorical steps. If the proportion of complex sentences remains consistent across steps, the data points representing the steps should fit the regression line well. Any data point lying beyond the 95% confidence interval (CI) of the regression line was identified as a step with a significantly high or low proportion of complex sentences. 3. Results 3.1. Differences in syntactic complexity across rhetorical steps A one-way MANOVA revealed significant between-step differences in the syntactic complexity measures considered (Wilks’s L ¼ .42, F(105, 2811) ¼ 5.16, p < .01, h2p ¼ .158). A series of follow-up one-way ANOVAs further revealed significant between-step differences for all five measures (sentence length, F(21, 578) ¼ 5.24, p < .01, h2p ¼ .16; nominalization, F(21, 578) ¼ 3.45, p < .01, h2p ¼ .11; finite dependent clause, F(21, 578) ¼ 8.67, p < .01, h2p ¼ .24; nonfinite dependent clause, F(21, 578) ¼ 4.3, p < .01, h2p ¼ .14; left-embeddedness, F(21, 578) ¼ 9.35, p < .01, h2p ¼ .25). Fig. 1 and Table 5 present the ANOM results for the five measures. Sentences realizing ‘M2_S1c, Question raising’ (mean ¼ 20.6, SD ¼ 4.98) and ‘M3_S7, Outlining the structure of the paper’ (mean ¼ 22.1, SD ¼ 6.21) were significantly shorter (p < .05) compared to the overall mean (mean ¼ 27.05, SD ¼ 6.33). Sentences realizing ‘M1_S1b, Real-world contextualization’ (mean ¼ .75, SD ¼ .41) and ‘M2_S1c, Question raising’ (mean ¼ .81, SD ¼ .51) contained significantly fewer (p < .05) nominalizations, while those realizing ‘M3_S6, Stating the value of present research’ (mean ¼ 1.50, SD ¼ .41) contained significantly more (p < .05) nominalizations compared to the overall mean (mean ¼ 1.21, SD ¼ .54). Sentences realizing ‘M1_S3, Reviewing items of previous research’ (mean ¼ 1.15, SD ¼ .29), ‘M2_S1a, Counter-claiming’ (mean ¼ 1.17, SD ¼ .66), ‘M3_S2a, Presenting research questions or hypotheses’ (mean ¼ 1.25, SD ¼ .46), ‘M3_S2b, Advancing new theoretical claims’ (mean ¼ 1.15, SD ¼ .39), and ‘M3_S5, Announcing and discussing results’ (mean ¼ 1.12, SD ¼ .41) contained significantly more (p < .05) finite dependent clauses compared to the group mean (mean ¼ .87, SD ¼ .43). Those realizing ‘M1_S1a, Claiming centrality or value of research area’ (mean ¼ .56, SD ¼ .21), ‘M1_S1b, Real-world contextualization” (mean ¼ .61, SD ¼ .31), ‘M2_S1d, Continuing a tradition’ (mean ¼ .72, SD ¼ .73), and ‘M3_S7, Outlining the structure of the paper’ (mean ¼ .55, SD ¼ .24) contained significantly fewer (p < .05) finite dependent clauses (mean ¼ .87, SD ¼ .43). Sentences realizing ‘M3_S1, Announcing present research’ (mean ¼ .88, SD ¼ .26) and ‘M2_S2, Providing justification’ (mean ¼ .86, SD ¼ .35) contained significantly more (p < .05) nonfinite dependent clauses compared to the group mean (mean ¼ .65, SD ¼ .35). Table 4 Descriptive statistics of each measure. Index

Mean

SD

3rd Quartile

Sentences meeting threshold

Sentence length Nominalization Finite dependent clause Nonfinite dependent clause Left-embeddedness

25.91 1.14 .83 .63 6.69

12.24 1.23 .97 .83 6.88

33 2 1 1 9

4961 6121 10846 8886 5241

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

7

Fig. 1. Comparison of syntactic complexity across rhetorical steps. Note: The green lines denote overall group means, and the brown lines denote the 95% detection limits. A data point outside the brown lines denotes that the mean for the rhetorical step is significantly higher than the overall group mean. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article).

Finally, sentences realizing ‘M2_S1b, Indicating a gap’ (mean ¼ 9.60, SD ¼ 2.21), ‘M2_S1e, Pointing out limitations of previous research” (mean ¼ 9.92, SD ¼ 6.29), and ‘M2_S2, Providing justification’ (mean ¼ 10.02, SD ¼ 5.02) were significantly more (p < .05) left-embedded compared to the group mean (mean ¼ 6.91, SD ¼ 3.19), while those realizing ‘M2_S1c, Question raising’ (mean ¼ 4.27, SD ¼ 2.10), ‘M3_S1, Announcing present research’ (mean ¼ 5.33, SD ¼ 1.23), ‘M3_S3, Definitional clarification’ (mean ¼ 4.82, SD ¼ 2.27), and ‘M3_S7, Outlining the structure of the paper’ (mean ¼ 4.72, SD ¼ 1.82) were significantly less (p < .05) left-embedded.

3.2. Rhetorical functions of syntactically complex sentences The second part of our analysis examined the rhetorical functions of the most complex sentences produced along the five syntactic complexity measures in COSSRAI. Below, the bar charts illustrate the overall number of sentences tagged for each step in COSSRAI alongside the number of sentences determined to be ‘complex’ for each measure. The figures display the results of the linear regression analysis for each measure. Complex sentences associated with steps falling outside the 95% CI of the regression line were manually analyzed for patterns that could reveal pedagogically useful insights into the nature of the form-function relationship. A few examples are briefly discussed for each measure. In terms of sentence length (Figs. 2 and 3), four steps displayed significantly higher proportions of sentences meeting the threshold than would be expected by the overall number of sentences tagged with the step code, namely ‘M1_S3, Reviewing items of previous research,’ ‘M3_S1, Announcing present research,’ ‘M3_S2a, Presenting research questions or hypothesis,’ and ‘M3_S4a, Summarizing methods.’ Five steps displayed significantly lower proportions of sentences meeting the threshold than expected, namely ‘M1_S1a, Claiming centrality or value of research area,’ ‘M1_S1b, Real world contextualization,’ ‘M2_S1c, Question-raising,’ ‘M3_S4b, Explaining a mathematical model,’ and ‘M3_S7, Outlining the structure of the paper.’ A manual review of long sentences associated with steps that contained significantly larger or smaller proportions of long sentences revealed insights into the relationship between length of sentence and these rhetorical aims. As illustrated in Example 1, social science writers often synthesized or established connections between specific items of previous research, thus resulting in many longer sentences. Example 2 demonstrates a distinct tendency in ‘M3_S1, Announcing present research’ that also resulted in many long sentences. Authors often included descriptions of the object of study or theoretical context to their purpose statements, such as the ‘given our view that’ clause below, and sometimes separated their purpose statements into stages reflecting research questions or hypotheses, thus lengthening the sentence. The writers later returned to develop these ideas in text that was oriented rhetorically towards their explication. In contrast, ‘M3_S7, Outlining the structure of the paper’ was often realized through a series of short and direct sentences that connected a label with its contents, as demonstrated in Example 3. Ex. 1. More recently [Researchers] show that fMRI measures of the extent to which subjects activate brain areas associated with concrete cognitive skills, such as the ability to predict another person’s state of mind, might be useful in identifying

8

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Table 5 Rhetorical steps that differ significantly from the group (p < .05). Measure

Significantly high

Significantly low

Sentence length Nominalization Finite dependent clause Nonfinite dependent clause Left-embeddedness

None M3_S6 M1_S3, M2_S1a, M3_S2a, M3_S2b, M3_S5 M2_S2, M3_S1 M2_S1b, M2_S1e, M2_S2

M2_S1c, M3_S7 M1_S1b, M2_S1c M1_S1a, M1_S1b, M2_S1d, M3_S7 None M2_S1c, M3_S1, M3_S3, M3_S7

which subjects would be successful traders, while [Researchers] look at how the brain tracks correlation during an attempt to optimally hedge two sources of risk. Economics, Reviewing items of previous research Ex. 2. Given our view that two different models of accelerated aging may reflect a single evolved process of accelerated development - the developmental-origins-of-health and disease framework linking early adversity with increased morbidity and early mortality later in life and a reproductive-strategy one linking similar early experiences with earlier sexual maturation in females - we seek to test the following propositions: (1) that greater prenatal stress exposure will predict greater maternal depression and negative parenting in infancy - both known to forecast more problematic child functioning and to be interrelated, and (2) that such early experiences will themselves predict elevated basal cortisol at age 4.5 years (3) which itself will predict accelerated adrenarcheal development in first grade, (4) which itself will predict poorer physical and mental health at age 18. Psychology, primarily Announcing present research Ex. 3. Section II describes the program. Section III provides estimates of the impact of the TVA on the region ‘s economy. Section IV develops our spatial equilibrium model. Section V estimate the model ‘s parameters and the program effects on the national economy. Section VI concludes. Economics, Outlining the structure of the paper For nominalizations (Figs. 4 and 5), six steps showed significantly higher proportions of sentences meeting the nominalization threshold than expected, namely ‘M1_S1a, Claiming centrality or value of research area,’ ‘M2_S1b, Indicating a gap,’ ‘M2_S1e, Pointing out limitations of previous research,’ ‘M3_S1, Announcing present research,’ ‘M3_S2b, Advancing new theoretical claims,’ and ‘M3_S6, Stating the value of present research.’ Five steps had significantly lower proportions of sentences meeting the threshold than expected, namely ‘M1_S1b, Real world contextualization,’ ‘M2_S1c, Question raising,’ ‘M3_S4b, Explaining a mathematical model,’ ‘M3_S4c, Describing analyzed scenario,’ and ‘M3_S7, Outlining the structure of the paper.’ A commonality among the text of some steps (i.e., M2_S1e, M3_S1, and M3_S6) associated with sentences containing two or more nominalizations is the recurring use of research oriented concepts in nominalized form, such as ‘examination’, ‘estimation’, ‘expectation’, and ‘association’, as seen in Examples 4, 5, and 6. In addition, value-oriented statements include frequent use of the words ‘contribution’ and ‘implication,’ which are nominalizations themselves.

Fig. 2. Distribution of complex sentences by sentence length across steps with number of sentences on the y axis and step on the x axis. Note: Orange bars represent total sentences and blue bars represent sentences meeting the threshold of 33 words. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

9

Fig. 3. Steps with significantly higher or lower proportions of complex sentences in terms of sentence length. Note: Green dotted lines denote the 95% CI; purple dotted lines denote the 95% prediction interval. R2 ¼ 96.9%; R2 (adjusted) ¼ 96.7%. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Ex. 4. Therefore, another contribution of this paper is its examination of the proposition that telecommuting’s implications for employee effectiveness are intricately linked to and contingent on two key aspects of the social context: leader-member exchange (LMX) and normativeness of telecommuting in the workgroup. Psychology, Stating the value of present research Ex. 5. By reframing the estimation problem in this way, existing diagnostic measures and bias reduction techniques for regression to improve inferences that are unavailable with classification-based approaches can be used. Political Science, Stating the value of present research Ex. 6. Addressing the theoretical expectation of intergenerational upward mobility among affiliates of evangelical denominations, I analyze period and birth cohort changes in the association between evangelical Protestant affiliation and education, family income, and occupational prestige. Sociology, Announcing present research With regard to the finite dependent clause measure (Figs. 6 and 7), four steps had significantly higher proportions of sentences meeting the threshold than expected, namely ‘M1_S3, Reviewing items of previous research,’ ‘M2_S1b, Indicating a

Fig. 4. Distribution of complex sentences by number of nominalizations across steps with number of sentences on the y axis and step on the x axis. Note: Orange bars represent total sentences and blue bars represent sentences meeting the threshold of two nominalizations. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

10

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Fig. 5. Steps with significantly higher or lower proportions of complex sentences in terms of nominalizations. Note: Green dotted lines denote the 95% CI; purple dotted lines denote the 95% prediction interval. R2 ¼ 98.8%; R2 (adjusted) ¼ 98.8%. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

gap,’ ‘M3_S2a, Presenting research questions or hypotheses,’ and ‘M3_S2b, Advancing new theoretical claims.’ Four other steps demonstrated significantly lower proportions of sentences meeting the threshold than expected, namely ‘M1_S1a, Claiming centrality or value of research area,’ ‘M1_S1b, Real world contextualization,’ ‘M3_S4a, Summarizing methods,’ and ‘M3_S7, Outlining the structure of the paper.’ The use of finite dependent clauses in both the articulation of research questions/hypotheses and the explanation of results appears to be related in many cases to the explanation of relationships and the discussion of conditions under which these relationships can be expected to occur. In Example 7 below, a finite dependent clause allows the writer to motivate their hypothesis as an expectation, and the condition of ‘receiving a significant language input’ is established for their hypothesis through another. Similarly, Example 8 contains a number of ‘if’ clauses to establish the parameters for the relationship they announce occurred. Ex. 7. Since speakers and listeners who are exposed to particular dialects adapt not only their own speech production but also their underlying mental representations, it is to be expected that both natives and L2 speakers who receive a significant amount of clearly non-native input may develop or retain a contact variety of that language that is not necessarily restricted by constraints to ultimate attainment but assimilates to the variety that they are exposed to every day. Applied Linguistics, Presenting research questions or hypotheses Ex. 8. A main result of the article is that if the countries’ ex ante expectation about the capabilities of the terrorists (i.e. their common expectation before they each gather new intelligence) is relatively high and if counterterrorism investments substantially decrease the success rate of attempted attacks, then national intelligence gathering increases the inefficiencies in counterterrorism provision relative to the common intelligence benchmark. Political Science, Announcing and discussing results For the nonfinite dependent clause measure (Figs. 8 and 9), five steps showed significantly higher proportions of sentences meeting the threshold, namely, ‘M2_S2, Providing justification,’ ‘M3_S1, Announcing present research,’ ‘M3_S4a, Summarizing Methods,’ ‘M3_S6, Stating the value of present research,’ and ‘M3_S8, Rationalizing research focus and design.’ Four steps showed significantly lower proportions of sentences meeting the threshold than expected, namely, ‘M1_S1a, Claiming centrality or value of research area,’ ‘M2_S1b, Indicating a gap,’ ‘M3_S5, Announcing and discussing results,’ and ‘M3_S7, Outlining the structure of the paper.’ The use of nonfinite clauses in statements which announce the present study are observably associated with ‘[research verb] þ to þ verb’ constructions, as evidenced by Examples 9 and 10. In addition, these two examples illustrate the importance of nonfinite gerunds, either connected with by-phrases, (e.g., by drawing) or other prepositions (e.g., for enriching) to include methodological, theoretical, or implication-based information in propositionally dense purpose statements. Ex. 9. This article seeks to fill this gap by drawing on recent developments in political psychology to examine how people respond to informational cues when evaluating the ethical conduct of those who represent them and the role of attentiveness to politics in shaping such responses. Political Science, Announcing present study Ex. 10. The first is to outline and try to account for the ongoing failure to achieve any deepening engagement between these two sub-disciplines; the second is both to note the relative neglect of Erik Erikson’s model of adult development and to draw out its scope for enriching a sociology of the life course (and vice versa for Eriksonian scholarship); and the last is the potential

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

11

Fig. 6. Distribution of complex sentences by number of finite dependent clauses across steps with number of sentences on the y axis and step on the x axis. Note: Orange bars represent total sentences and blue bars represent sentences meeting the threshold of one finite dependent clause. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

for sociology more generally to engage with Erikson’s ideas and develop an alternative point of reference that goes beyond those employed in conventional life course sociology. Sociology, Announcing present study For left-embeddedness (Figs. 10 and 11), five steps showed significantly larger proportions of sentences meeting the threshold, namely, ‘M1_S1a, Claiming centrality or value of research area,’ ‘M1_S1b, Real world contextualization,’ ‘M2_S1b, Indicating a gap,’ ‘M2_S2, Providing justification,’ and ‘M3_S4c, Describing analyzed scenario.’ In contrast, four steps showed significantly lower proportions of sentences meeting the threshold, namely, ‘M3_S1, Announcing present research,’ ‘M3_S4a, Summarizing methods,’ ‘M3_S6, Stating the value of present research,’ and ‘M3_S7, Outlining the structure of the paper.’ Manual analysis of heavily left-embedded sentences reveal important connections between the distance of the main verb from the start of the sentence and the rhetorical aims of the authors. For example, writers’ efforts to ‘Indicate a gap’ often included a subordinate clause with a general positive statement regarding the discipline community’s efforts in a particular domain before introducing what may be construed as a negative claim in the gap, as demonstrated in Example 11. This tendency increased the left-embeddedness in a large number of gap-building sentences. In contrast, and as illustrated by Example 12, sentences containing announcements of the present research often began with research oriented main verbs close to the start of the sentence, perhaps to ensure that the purpose statements had strong rhetorical cues for readers. Ex. 11. Although a large economics literature addresses the information aggregation issue at the heart of group decision making - social choice theory, mechanism design, political economy, and other fields have offered both theoretical and empirical insights - less attention has been paid to the information contribution stage of group decision making. Economics, Indicating a gap Ex. 12. This article analyzes this uncertainty over what historicity might mean and examines how it has been and can be adapted and what theoretical service it might perform for anthropology. Anthropology, Announcing present research 4. Discussion Our results show that disciplinary writers vary their choices in the use of complex syntactic structures depending on their rhetorical goals. Results pertaining to our first research question revealed significant variation across the rhetorical steps in the degree of syntactic complexity assessed using all five indices considered, indicating that different rhetorical functions may entail greater or lesser use of different complex structures. Three interesting patterns are worth noting. First, the means for sentence length and nonfinite dependent clauses across the rhetorical steps appeared to be more homogeneous than those for the other measures, with only two steps showing significantly lower means for sentence length and another two showing

12

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Fig. 7. Steps with significantly higher or lower proportions of complex sentences in terms of finite dependent clauses. Note: Green dotted lines denote the 95% CI; purple dotted line denote the 95% prediction interval. R2 ¼ 97.7%; R2 (adjusted) ¼ 97.6%. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

significantly higher means for nonfinite dependent clauses than the group. Taken on its own, this suggests either that writers do not vary their production of long/short sentences or their use of nonfinite dependent clauses in the furtherance of rhetorical goals overall, or that the use of such structures may be stylistic rather than rhetorical in nature. This point will be revisited below, however, as a consideration of what the longest sentences accomplish rhetorically, for example, highlights an interesting and important tendency. Second, there was generally no overlap between the two sets of steps that displayed significantly higher and lower means for one or more measures. The only exception was ‘M3_S1, Announcing present research,’ which exhibited a significantly lower mean for left-embeddedness and a significantly higher mean for nonfinite dependent clauses than the group, suggesting that expert writers tend to be simultaneously direct and elaborate in announcing present research. This finding reinforces that language may be syntactically complex through the use of a variety of structures on phrasal, clausal, and global scales, and that such structures afford writers greater communicative potential. That is to say that particular rhetorical claims do not appear to be ‘more complex’ than others in an absolute manner, but rather writers advance rhetorically similar claims through particularized forms of sophisticated constructions. Third, while most rhetorical steps in those two sets differed significantly from the group in only one measure, a few steps differed in two or more measures, indicating expert writers’ tendency to employ especially more or less complex sentences to realize those functions. Specifically, ‘M2_S2, Providing justification’ displayed significantly higher means for finite dependent clauses and left-embeddedness; ‘M1_S1b, Real-world contextualization’ displayed significantly lower means for nominalizations and finite dependent clause; ‘M2_S1c, Question raising’ displayed significantly lower means for sentence length, nominalizations, and left-embeddedness; and ‘M3_S7, Outlining the structure of the paper’ displayed significantly lower means for sentence length, finite dependent clause, and left-embeddedness. Similarly, results pertaining to our second research question indicated that writers varied their production of complex sentences according to their rhetorical goals. Manual examination of extracted complex sentences for each step identified as significant for a given feature often provides insights into the affordances that complex structures offer academic writers, such as the potential of nominalizations to convey the activities of research processes within discussions of value statements. Returning to the finding that mean sentence length did not distinguish the language of distinct rhetorical aims overall, the analysis of what the most complex sentences for this measure accomplish rhetorically (i.e., what rhetorical aims were associated with the longest sentences), highlighted an association with ‘M3_S1, Announcing present research.’ Thus, while the findings of RQ1 alone suggest that variation in sentence length is a stylistic, rather than rhetorical concern for COSSRA writers, the results of RQ2 suggest that writers are at times willing to diverge from their sentence length tendencies for such rhetorical purposes. When considered together, even the small set of syntactic complexity measures adopted here highlight that some steps displayed rich complexity profiles across features. For example, ‘M3_S1, Announcing present research’ was associated with a significantly greater than expected number of complex sentences for sentence length, nominalization density, and nonfinite dependent clause measures and was associated with significantly fewer than expected complex sentences for the leftembeddedness measure. When taken together, these features provide insights into the rhetorical activity of announcing the aim of the study by highlighting that they are often propositionally dense, as they may contain methodological, theoretical, and analytical components, and are likely to contain reference to nominalized research processes and concepts, but often have strong rhetorical cues through purpose verbs (e.g., aim) that occur early in a sentence. At the same time, when these syntactic measures are considered individually across the rhetorical moves and steps they appear to be associated with, they often highlight broader rhetorical activities. For example, sentences with two or more finite dependent clauses often specified

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

13

Fig. 8. Distribution of complex sentences by number of nonfinite dependent clauses across steps with number of sentences on the y axis and step on the x axis. Note: Orange bars represent total sentences and blue bars represent sentences meeting the threshold of one nonfinite dependent clause. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

the conditions or parameters under which findings were hypothesized to occur, were made, or could be confidently generalized. These findings have important implications for syntactic complexity research. Our findings provide robust evidence for the existence of a form-function connection between complex syntactic structures and rhetorical functions, highlighting the value of function-oriented approaches to syntactic complexity research. Syntactic complexity development research should begin considering learners’ developing abilities to use different types of complex syntactic structures in genre-appropriate and functionally effective ways, in addition to the existing focus on the developmental patterns of the different dimensions of syntactic complexity in quantitative terms. Automatic writing assessment research should also start exploring

Fig. 9. Steps with significantly higher or lower proportions of complex sentences in terms of nonfinite dependent clauses. Note: Green dotted lines denote the 95% CI; purple dotted lines denote the 95% prediction interval. R2 ¼ 97.9%; R2 (adjusted) ¼ 97.8%. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

14

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Fig. 10. Distribution of complex sentences by left-embeddedness across steps with number of sentences on the y axis and step on the x axis. Note: Orange bars represent total sentences and blue bars represent sentences meeting the threshold of nine words to the left of the main verb. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

more robust ways to consider the relationship of the genre appropriateness and functional effectiveness of form-function mappings to writing quality, rather than the frequency of complex syntactic structures alone. Our findings also contribute to the emerging line of research that aims to systematically discover linguistic features at different levels associated with different rhetorical functions. Specifically, our findings demonstrate the usefulness to integrate the syntactic dimension in profiling the linguistic features of academic writing and their relationship to rhetorical functions, in addition to the lexical and phraseological dimensions that have been considered so far (Cortes, 2013; Durrant & Mathews-Aydınlı, 2011; Le & Harrington, 2015; Omidian et al., 2018). This body of research is already starting to inform integrated formal and functional approaches to EAP pedagogy aimed at promoting comprehensive genre knowledge and competence (e.g., Charles, 2011; Chen & Flowerdew, 2018; Cotos, Huffman, & Link, 2017). This research highlights that writers’ formal and rhetorical choices are intertwined across phrasal, clausal, and global dimensions of syntactic complexity as well. In addition, the current study helps enrich the pedagogical resources generated by previous studies on lexical and phraseological features by contributing a repertoire of syntactically complex sentences aligned with different rhetorical functions. Similarly, when syntactic, phraseological, and lexical conventions of genre practices are explored across writers’ rhetorical aims, it is likely that the resulting linguistic profiles of rhetoric provide insights into the rhetorical activity itself.

Fig. 11. Steps with significantly higher or lower proportions of complex sentences in terms of left-embeddedness. Note: Green dotted lines denote the 95% CI; purple dotted lines denote the 95% prediction interval. R2 ¼ 95.7%; R2 (adjusted) ¼ 95.5%. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

15

Our results on the distribution of sentences realizing different rhetorical functions and the syntactic complexity features of those sentences may prove useful for automatic rhetorical function annotation research. Cotos and Pendar (2016) evaluated the usefulness of n-gram features in developing an automatic rhetorical move and step annotator and reported overall Fscores of 0.654 for move classification and 0.61 for step classification. The integration of a richer set of linguistic features connected to rhetorical functions, including the syntactic features considered in the current study, may help improve the performance of such automatic systems. 5. Conclusion This study examined the rhetorical functions of syntactically complex sentences in a corpus of 600 social science RA introductions. Our analysis revealed significant variation in syntactic complexity among sentences realizing different rhetorical functions, as well as in the proportions of syntactically complex sentences that expert writers employed to realize different rhetorical functions. These results point to a clear form-function connection between complex syntactic structures and rhetorical functions in academic writing and make a strong case for function-oriented approaches to syntactic complexity research. Our analysis also expands the repertoire of linguistic features considered in the emerging line of research that examines the linguistic realizations of rhetorical functions in academic writing. As an early step towards discovering the relationship between complex syntactic structures and rhetorical functions in academic writing, this study has several limitations. First, we considered a small set of measures that represent different dimensions of syntactic complexity and that are intuitively useful to academic writing teachers and learners. Future research could consider other measures that have been found to characterize academic writing or correlate with writing quality (e.g., Biber et al., 2016; Kyle & Crossley, 2018; Lu, 2017) as well as the co-occurrence patterns of different measures (e.g., Biber et al., 2016). Second, the current study did not systematically examine interdisciplinary variation in the relationship between syntactic complexity and rhetorical functions, which we intend to pursue in our future research. Third, the current study focused on a highly specific part-genre, i.e., research article introductions, of expert writing in social science disciplines. Future research can expand the analytical scope to other genres and academic disciplines as well as to novice writers. Finally, our future research will investigate the usefulness of the function-annotated corpus and the repertoire of syntactically complex sentences extracted for different rhetorical functions in corpus- and genre-based academic writing pedagogy. Author contribution Xiaofei Lu: Conceptualization, Methodology, Data Curation, Investigation, Formal Analysis, Software, Writing e Original Draft, Supervision, Project Administration. J. Elliott Casal: Methodology, Data Curation, Investigation, Formal Analysis, Writing e Original Draft. Yingying Liu: Methodology, Data Curation, Investigation, Formal Analysis, Visualization, Writing e Original Draft. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.jeap.2019.100832. References Bhatia, V. K. (2008). Genre analysis, ESP and professional practice. English for Specific Purposes, 27, 161e174. Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: Complexity, elaboration, explicitness. Journal of English for Academic Purposes, 9, 2e20. Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? Tesol Quarterly, 45, 5e35. Biber, D., Gray, B., & Staples, S. (2016). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, 37, 639e668. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. New York: Longman. , B., & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42e65. Bulte Chan, H., Verspoor, M., & Vahtrick, L. (2015). Dynamic development in speaking versus writing in identical twins. Language Learning, 65, 298e325. Charles, M. (2011). Using hands-on concordancing to teach rhetorical functions: Evaluation and implications for EAP writing classes. In A. FrankenbergGarcia, L. Flowerdew, & G. Aston (Eds.), New trends in corpora and language learning (pp. 26e43). London: Continuum. Chen, M., & Flowerdew, J. (2018). Introducing data-driven learning to PhD students for research writing purposes: A territory-wide project in Hong Kong. English for Specific Purposes, 50, 97e112. Cortes, V. (2013). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 12, 33e43. Cotos, E., Huffman, S., & Link, S. (2017). A move/step model for methods sections: Demonstrating rigour and credibility. English for Specific Purposes, 46, 90e106. Cotos, E., & Pendar, N. (2016). Discourse classification into rhetorical functions for AWE feedback. CALICO Journal, 33, 92e116. Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66e79. Cunningham, K. J. (2017). A phraseological exploration of recent mathematics research articles through key phrase frames. Journal of English for Academic Purposes, 25, 71e83. Durrant, P., & Mathews-Aydınlı, J. (2011). A function-first approach to identifying formulaic language in academic writing. English for Specific Purposes, 30, 58e72. Esfandiari, R., & Barbary, F. (2017). A contrastive corpus-driven study of lexical bundles between English writers and Persian writers in psychology research articles. Journal of English for Academic Purposes, 29, 21e42.

16

X. Lu et al. / Journal of English for Academic Purposes 44 (2020) 100832

Flowerdew, L. (2005). An integration of corpus-based and genre-based approaches to text analysis in EAP/ESP: Countering criticisms against corpus-based methodologies. English for Specific Purposes, 24, 321e332. Halleck, G. B., & Connor, U. M. (2006). Rhetorical moves in TESOL conference proposals. Journal of English for Academic Purposes, 5, 70e86. Hirano, E. (2009). Research article introductions in English for specific purposes: A comparison between Brazilian Portuguese and English. English for Specific Purposes, 28, 240e250. Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics, 30, 461e473. Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication (Unpublished Doctoral Dissertation). Atlanta, Georgia: Georgia State University. Kyle, K., & Crossley, S. A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102, 333e349. Le, T. N. P., & Harrington, M. (2015). Phraseology used to comment on results in the discussion section of applied linguistics quantitative research articles. English for Specific Purposes, 39, 45e61. Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15, 474e496. Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. Tesol Quarterly, 45, 36e62. Lu, X. (2014). Computational methods for corpus annotation and analysis. Dordrecht: Springer. Lu, X. (2017). Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Language Testing, 34, 493e511. Lu, X., & Ai, H. (2015). Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing, 29, 16e27. Lu, X., & Deng, J. (2019). With the rapid development: A contrastive analysis of lexical bundles in dissertation abstracts by Chinese and L1 English doctoral students. Journal of English for Academic Purposes, 39, 21e36. Lu, X., Yoon, J., & Kisselev, O. (2018). A phrase-frame list for social science research article introductions. Journal of English for Academic Purposes, 36, 76e85. Macleod, C., Grishman, R., Meyers, A., Barrett, L., & Reeves, R. (2001). NOMLEX. New York: New York University. Retrieved from https://nlp.cs.nyu.edu/ nomlex/. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd annual meeting of the association for computational linguistics: System demonstrations (pp. 55e60). Baltimore: MD: Association for Computational Linguistics. McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge: Cambridge University Press. Moreno, A. I., & Swales, J. M. (2018). Strengthening move analysis methodology towards bridging the function-form gap. English for Specific Purposes, 50, 40e63. Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30, 555e578. Omidian, T., Shahriari, H., & Siyanova-Chanturia, A. (2018). A cross-disciplinary investigation of multi-word expressions in the moves of research article abstracts. Journal of English for Academic Purposes, 36, 1e14. Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24, 492e518. Ortega, L. (2015). Syntactic complexity in L2 writing: Progress and expansion. Journal of Second Language Writing, 29, 82e94. Pan, F., Reppen, R., & Biber, D. (2016). Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in telecommunications research journals. Journal of English for Academic Purposes, 21, 60e71. Polio, C. (2017). Second language writing development: A research agenda. Language Teaching, 50, 261e275. Ryshina-Pankova, M. (2015). A meaning-based approach to the study of complexity in L2 writing: The case of grammatical metaphor. Journal of Second Language Writing, 29, 51e63. Samraj, B. (2002). Introductions in research articles: Variations across disciplines. English for Specific Purposes, 21, 1e17. Swales, J. M. (1990). Genre analysis: English in academic and research settings. New York: Cambridge University Press. Swales, J. M. (2004). Research genres: Explorations and applications. New York: Cambridge University Press. Swales, J. M. (2019). The futures of EAP genre studies: A personal viewpoint. Journal of English for Academic Purposes, 38, 75e82. Tardy, C. M. (2009). Building genre knowledge. West Lafayette: Parlor Press. Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy and complexity. Honolulu, HI: University of Hawaii, Second Language Teaching and Curriculum Center. Yang, W., Lu, X., & Weigle, S. C. (2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53e67. Yoon, H.-J., & Polio, C. (2017). The linguistic development of students of English as a second language in two written genres. Tesol Quarterly, 51, 275e301. Xiaofei Lu is Professor of Applied Linguistics and Asian Studies at The Pennsylvania State University. His research interests are primarily in corpus linguistics, English for Academic Purposes, second language writing, and intelligent computer-assisted language learning. He is the author of Computational Methods for Corpus Annotation and Analysis (Lu, 2014). J. Elliott Casal is a Ph.D. candidate in Applied Linguistics and a University Graduate Fellow at The Pennsylvania State University. His research interests include corpus linguistics, English for Academic/Specific Purposes, second language writing, and corpus-based writing pedagogies. His work appears in the Journal of Second Language Writing, Journal of English for Academic Purposes, Language Learning and Technology, and System. Yingying Liu is a Ph.D. candidate in the Department of Applied Linguistics at The Pennsylvania State University. Her research interests include corpus linguistics, English for Academic Purposes, English phraseology, and lexicography.