Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin

Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin

+ Models LINGUA-2680; No. of Pages 24 Available online at www.sciencedirect.com ScienceDirect Lingua xxx (2018) xxx--xxx www.elsevier.com/locate/li...

2MB Sizes 0 Downloads 29 Views

+ Models

LINGUA-2680; No. of Pages 24

Available online at www.sciencedirect.com

ScienceDirect Lingua xxx (2018) xxx--xxx www.elsevier.com/locate/lingua

Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin Ruochen Niu 1, Timothy Osborne * Department of Linguistics, Zhejiang University, Hangzhou, 310058, China Received 20 December 2018; received in revised form 6 March 2019; accepted 6 March 2019

Abstract Speakers and listeners produce and process language in chunks. These chunks are not arbitrary combinations of words, but rather they are motivated in part by the syntactic structure of the sentences at hand. This article employs chunking data collected from informants to examine and reach conclusions about the syntactic structure of Mandarin sentences. Five prominent constructions are explored: 1) pivotal sentences, 2) serial verb constructions (or co-verbs), 3) V-de constructions, 4) bǎ constructions, and 5) bèi constructions. The account is couched in a dependency grammar (DG) approach to syntax. A relatively new unit of syntax, the component, plays a central role in the analysis. The message is in part that the component is a more suitable unit of syntax for predicting how informants choose to chunk sentences than the phrase structure grammar (PSG) constituent. © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons. org/licenses/by-nc-nd/4.0/). Keywords: Chunking; Dependency grammar; Component; Mandarin; Phrase structure grammar; Constituent

1. Introduction When given the sentence Wǒ yòng kuàizi chi fàn ‘I use chopsticks to eat’ and asked to divide it into two chunks, educated non-linguist native speakers of Mandarin are likely to produce the chunks shown in (1a) or (1b): (1)

I use chopsticks eat meal a. Wǒ yòng kuàizi | chi fàn. -- 4 syllables | 2 syllables b. Wǒ | yòng kuàizi chi fàn. -- 1 syllable | 5 syllables c. Wǒ yòng | kuàizi chi fàn. -- 2 syllables | 4 syllables ‘I use chopsticks to eat’ or ‘I eat using chopsticks’.

* Corresponding author at: School of International Studies, Zhejiang University, Zijingang campus, No. 866 Yuhangtang Road, Hangzhou, CN310058, China. E-mail address: [email protected] (T. Osborne). 1 Address: Department of International Studies, Zhejiang University, Zijinggang Campus, No. 866 Yuhangtang Road, Hangzhou, CN-310058, China.

https://doi.org/10.1016/j.lingua.2019.03.003 0024-3841/© 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http:// creativecommons.org/licenses/by-nc-nd/4.0/).

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

2

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

The speakers are unlikely to produce the chunks shown in (1c), however.2 This fact is somewhat surprising in view of the number of words/syllables appearing in each chunk. In particular, the fact that the chunks in (1b) are strongly preferred over the chunks in (1c) is unexpected due to the unequal prosodic weights of the two chunks in (1b) -- 1 syllable vs. 5 syllables (cf. Grosjean et al., 1979). One might expect, namely, that the chunks in (1c) would be chosen to the same degree as the chunks in (1a) because the chunks in (1a) and (1c) are unequal in syllable weight to the same degree. The core idea developed in this article is that chunking data collected from informants of the sort illustrated with examples (1a--c) deliver clues about the syntactic structure of the tested sentences. In other words, informants choose to chunk sentences in a way that is sensitive to the sentence structures of the language at hand, in this case, Mandarin Chinese. Words are grouped into phrases and clauses, and when informants are invited to chunk sentences, they do so in a manner that is sensitive to these groupings. Hence the reason that Chinese speakers prefer the chunks in (1a) or (1b) over the chunks in (1c) is located in the syntactic structure of the sentence. In particular, we see that kuàizi ‘chopsticks’ is grouped together with yòng ‘use’ in both (1a) and (1b), but not in (1c). The actual syntactic structure of the sentence is such that the object kuàizi is indeed grouped with the preceding verb yòng rather than with the following verb chi ‘eat’. The approach to chunking pursued in this article is unlike previous analyses, for it is couched in dependency grammar (DG) framework of syntax.3 Previous accounts (e.g. Grosjean et al., 1979; Gee and Grosjean, 1983; Abney, 1992) have been couched in phrase structure grammar (PSG).4 Furthermore, the key unit of syntax that helps predict how informants will chunk sentences is the component, a word or a string of words that are linked together by dependencies. Every PSG constituent is a DG component, but some DG components are not PSG constituents. The DG component is hence a more flexible and inclusive unit of syntax than the PSG constituent. Osborne and Groß (2016: 117) define the component unit as ‘‘a word or a combination of words that are continuous with respect to precedence and dominance’’.5 For example, there are 12 distinct components in the following dependency tree:

Using the capital letters to abbreviate the words, the 12 distinct components in (2) are, including the whole, listed next: A, B, C, D, E, AB, BC, DE, ABC, CDE, BCDE, and ABCDE. In the/a corresponding phrase structure analysis of the same sentence, there are just 9 distinct constituents:

The 9 constituents shown in this phrase structure tree are, including the whole, listed next: A, B, C, D, E, AB, DE, CDE, and ABCDE. These numbers demonstrate that the DG component is somewhat more inclusive than the PSG constituent. Part of the message delivered below is that because it is more inclusive than the PSG constituent, the DG component may be more suited to serve as a basis for predicting how informants will chunk sentences. This is supported by the fact that many of the chunks chosen by informants are DG components but not PSG constituents. This article is about the chunking of Mandarin sentences -- English examples are used only when establishing core DG notions. Reasoning based on the component unit and informant data are employed to resolve some areas of debate

2

We tested the sentence on 44 informants. 24 of them chunked the sentence as in (1a), 18 as in (1b), and 2 as in (1c). Dependency grammar has a venerable tradition that reaches back more than a century, (e.g. Kern, 1883; Tesnière, 1959, 2015; Hays, 1964; Robinson, 1970; Matthews, 1981; Mel’cuk and Pertsov, 1987; Mel’cuk, 1988; Schubert, 1987; Starosta, 1988; Engel, 1994; Heringer, 1996; Bröker, 1999; Groß, 1999; Eroms, 2000; Ágel et al., 2003, 2006; Hudson, 1984, 1990, 2007, 2010). 4 We use the term phrase structure grammar (PSG) in a broad sense, that is, to denote all the grammars that assume phrase structure rather than dependency as the basis of sentence structures, e.g. Transformational Grammar (TG), Government and Binding Theory (GB), Minimalist Program (MP), etc. 5 Note that while Osborne and Groß (2016) define the component unit, they do not investigate its potential to serve as a basis for chunking or for any other phenomena of syntax. They introduce it mainly to serve as a point of comparison to another unit of syntax, the catena -- see Section 3.2. 3

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

3

concerning the basic structural analyses of Mandarin sentences. The specific constructions investigated are listed next: 1) pivotal sentences, 2) serial verb constructions (or co-verbs), 3) V-de constructions, 4) bǎ constructions, and 5) bèi constructions. In the end, clear decisions about the best hierarchical analysis of these constructions are given. The linguistic contribution here occurs in various areas. This article introduces DG to the analysis of Mandarin sentences, establishes the value of a relatively new unit of syntax, the component, and sheds light on the structural analysis of key constructions in Mandarin Chinese. The article is organized as follows. Section 2 presents some background information on chunking in language. Section 3 introduces some tenets of the DG assumed in the study. Section 4 presents the materials, participants, and procedures of the data collection efforts. Section 5 discusses the debates surrounding the five constructions, presents the results of data collected, and interprets these results. Section 6 gives a general analysis of the constituent and component units and their value in predicting how sentences are chunked. Section 7 discusses an open issue we encountered in the calculation and Section 8 concludes the article. 2. Background on chunking The mechanism of chunking may be a universal of human languages, as proposed by Lu et al. (2016). However, chunking as a general cognitive mechanism was proposed more than half a century ago. Based on studies of perception and memory, Miller (1956) first claimed that the span of immediate memory (or working memory) is a limited number of chunks, which is around seven. Under this constraint, Miller added, we humans process information by building larger and larger chunks, i.e. by increasing the amount of information in each chunk rather than by creating more chunks (Miller, 1956: 149). The mechanism of chunking thus refers to the process by which smaller units of information are grouped together into bigger ones. Since Miller, this dynamic grouping process has been found in many cognitive processes, such as learning, perception, and expertise development (e.g. Chase and Simon, 1973; Gobet et al., 2001; Chekaf et al., 2016). While studies on the creation, storage, and influence of chunks are numerous, one crucial issue is often neglected, that is, what exactly is a chunk? Miller (1956) did not give a clear account of what constitutes a chunk, and from the earlier descriptions, we can only infer that a chunk is ‘‘a collection of elements having strong associations with one another, but weak associations with elements within other chunks’’ (Gobet et al., 2001: 236). Due to limited capacity of working memory, chunking may also be present to reduce the burden in language comprehension and production. But, again, what exactly is the nature of a language chunk? In the field of linguistics, this issue is of particular significance. Is a chunk an arbitrary grouping of linguistic elements, or a group of hierarchically related elements regulated by specific rules? Most existing investigations addressing the question have been conducted from the perspective of phrase structure grammar (PSG). In experiments where participants were asked to read (producing pauses) or parse sentences, Grosjean et al. (1979) identified chunks, or what they called performance structures. They compared these chunks with the surface structures of the sentences, i.e. PSG constituents, and reported a high correlation between chunks and the constituent unit (for pausing chunks the correlation coefficient is 0.76 and for parsing chunks it is 0.82; correlation is generally significant when over 0.7). This finding, together with other similar evidence, verifies that chunks in language processing are not random, but rather they can be predicted in terms of syntactic structure. At the same time, Grosjean et al. observed that participants tend to produce chunks of equal length, which points to the influence of prosodic weight. Based on these findings, Gee and Grosjean (1983) further proposed an algorithm based on a prosodic unit. This unit, called the F-phrase (phi-phrase), partly matches the phonological and syntactic representations of a sentence and can be used to better analyze and predict chunking. It is defined as ‘‘all the material up to and including the head (of a syntactic phrase)’’ (Gee and Grosjean, 1983: 433). Gee & Grosjean then stipulate, however, that only content words can be heads, whereby they take auxiliary verbs and prepositions to be function words. One of the examples they employ (p. 435, Fig. 7) to illustrate F-phrases is given next: (4) has been reading | about the latest rumors | in Argentina This verb phrase contains three distinct F-phrases, as indicated by the two dividers. The auxiliary verbs has and been and the prepositions about and in are deemed function words, which means they do not qualify as heads. Hence, the heads of these three F-phrases are reading, rumors, and Argentina, respectively. Observe that latest in the second F-phrase is also a content word, but since it is not the syntactic head of the noun phrase the latest rumors, it does not establish its own separate F-phrase. Gee & Grosjean's pioneering works influenced subsequent studies, particularly on research that aims to build chunking treebanks (e.g. Abney, 1992; Sang et al., 2000). Our view, however, is that there may be a significant weakness in Gee & Grosjean's F-phrase. The notion relies on the difference between function word and content word, a distinction that is not black and white. Gee & Grosjean view prepositions as function words, yet many prepositions, namely those expressing Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

4

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

location, are semantically loaded, and further, modal auxiliary verbs expressing deontic or dynamic modality are semantically loaded in a way that other auxiliary verbs are not. A greater difficulty in the current context concerns the challenge of discerning and distinguishing between function word and content word in Mandarin. In view of these difficulties with the F-phrase notion, we are motivated to pursue an alternative account, one based on the DG component. The component is defined entirely in terms of syntactic structure and is hence largely independent of the distinction between function word and content word (see Section 3.2). Worth noting is that there has been some limited work on chunking in DG (e.g. Osborne and Niu, 2017; Niu, 2017). Some of the efforts in this area have been quantitative in nature, and the focus has been on the role that chunking plays in language processing (e.g. Lu and Cai, 2009; Lu et al., 2016). Chunking as a mechanism reduces the syntactic difficulty, or processing difficulty, of sentences; it is a mechanism of language driven by limited capacity of working memory. These efforts within DG have not focused on the nature of chunks from a structural point of view. This contribution seeks to fill this gap insofar as it focuses on the characterization of chunks from the perspective of syntactic structure. 3. Dependency grammar Dependency grammar (DG) is a group of syntactic theories that view dependencies, as opposed to phrase structure constituencies, as the basis of sentence structure. Compared to PSG, dependency syntax enjoys the advantage of parsimony, which contributes to its widespread application in computational linguistics and the field of natural language processing (NLP). The following subsection introduces the basic properties of the DG assumed in this article for the analysis of chunking. 3.1. Properties of dependency syntax A distinctive property of DG is one-to-one mapping between words and nodes, i.e. for each word in the sentence, there is exactly one node in the corresponding tree diagram. In contrast, phrase structure posits purely phrasal nodes that mediate between the actual words. With the presence of these purely phrasal nodes, the number of nodes always outnumbers the number of words by at least one. Phrase structure is hence a one-to-one-or-more mapping of words to nodes. The following trees illustrate the distinction:

The word-to-node ratio in the DG analysis (5a) is five-to-five, whereas in the PSG analysis (5b), the ratio is five-to-nine. A second central property associated with DG structures is strict endocentricity/headedness. An endocentric structure is headed/rooted, whereas an exocentric structure has no head/root. The phrase structures (6a) and (6b) are, for example, endocentric, while (6c) is exocentric:

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

5

Dependency cannot acknowledge exocentric structures like the one in (6c) because dependencies are necessarily directed, that is, dependency is a strict parent-child relation. Phrase structures, on the other hand, can and do at times acknowledge constituents without heads, such as in the traditional division of the sentence into a noun phrase and a verb phrase (S ! NP VP). The current DG is distinct from some other DGs in that it is monostratal in syntax. Unlike multistratal syntax, e.g. Meaning Text Theory (MTT) (Mel’cuk, 1988), the current DG takes both hierarchical and linear order as primitive. The dependency structures in this article therefore always encode actual word order, as in (2) and (5a).

3.2. Units of structure Given the descriptions above about the nature of dependency syntax, key units of structure can be defined. The current DG acknowledges four generic units of syntax: string, catena, constituent, and component. These units are defined as follows: String A word or a combination of words that are continuous with respect to precedence/linearity Catena A word or a combination of words that are continuous with respect to dominance/hierarchy Component A word or a combination of words that are continuous with respect to both precedence and dominance Constituent A word/node plus all the words/nodes that that word/node dominates, that is, a complete subtree6 These units are illustrated using the following tree. Note that the words themselves are now used as the nodes in the tree. This practice results in transparent tree diagrams and is the practice employed henceforth throughout for all dependency trees:

The capital letters again abbreviate the words. All of the distinct strings, catenae, constituents, and components in (8) are listed next: 15 distinct strings in (8): A, B, C, D, E, AB, BC, CD, DE, ABC, BCD, CDE, ABCD, BCDE, ABCDE 15 distinct catenae in (8): A, B, C, D, E, AB, BC, CE, DE, ABC, BCE, CDE, ABCE, BCDE, ABCDE 12 distinct components in (8): A, B, C, D, E, AB, BC, DE, ABC, CDE, BCDE, ABCDE 5 distinct constituents in (8): A, D, AB, DE, ABCDE Among these four, the focus in this article is on the component unit. The other three units are presented here to increase the understanding of the component through comparison. Most theories of syntax acknowledge strings in a manner consistent with the definition above. The constituent is a unit usually associated with PSGs, although some DGs also acknowlege constituents (e.g. Hudson, 1984: 92; Starosta, 1988: 105; Hellwig, 2003: 603; Anderson, 2011: 92). The catena is a unit the validity and utility of which has been established in a series of articles (e.g. O’Grady, 1998; Osborne, 2005; Osborne et al., 2012; Osborne and Groß, 2016). As stated above, research concerning the component in DG is sparse (e.g. Osborne and Niu, 2017). This study therefore endeavors to further establish the validity and utility of the component unit, demonstrating its value as a basis for analyzing the chunking mechanism as well as establishing the best hierarchical analysis of basic sentence constructions in Mandarin Chinese.

6 Due to the one-to-one mapping of dependency syntax -- see Section 3.1 -- a word is a node and vice versa. The definition of the constituent unit as formulated here is neutral across DG and PSG. A constituent is a complete subtree in both views of sentence structure.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

6

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

3.3. Components in PSG The message about chunking currently being developed concerns the component unit as just defined over dependency structures. Two anonymous reviewers raise the question in this regard whether the component unit could be defined over phrase structures as well and hence be of interest to the broad field of syntax more generally. The answer to this question is affirmative; it is possible to define the component over phrase structures so that the definition identifies the same strings as components as the dependency definition given just above. However, there are difficulties associated with doing this, and these difficulties can be disqualifying. Any dependency structure can be mechanically translated to a corresponding phrase structure. Hence, any of the dependency trees produced in this article can be translated to corresponding phrase structure trees. The next examples illustrate how this is done:

Translation of the dependency analysis (9a) to the phrase structure analysis (9b) involves adding a P (=phrase) marker to each head word in the dependency tree and a lower projection of that head word below it (e.g. N below NP, V1 below VP1, etc.). This translation process is mechanical and can easily be automated. Given phrase structures like (9b), one can define the component as follows:

Component (in PSG) A word or a combination of words the projections of which are continuous with respect to both precedence and dominance This definition identifies the same word combinations as components in (9b) that the dependency grammar definition of the component does in (9a). For instance, the string structures can is a component in (9b) because the NP projection of structures immediately precedes and is directly connected to the VP1 projection of can. There are difficulties associated with the phrase structure tree (9b), however. For one, the tree is relatively flat, flatter than many modern PSGs would accept. The relative flatness is due to two instances of ternary branching (at the VP1 and VP3 nodes). Many modern PSGs assume strict binarity of branching and would therefore reject the analysis given as (9b). Another problem associated with (9b) is that the entirety is a big VP (VP1). Traditionally, phrase structure syntax takes the sentence to be an exocentric construction, this exocentricity being present in the manner that the sentence S is divided

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

7

into a subject NP and predicate VP (S ! NP VP) -- see example (3) above. The phrase structure analyses resulting from translation from dependency cannot acknowledge exocentricity.7 Perhaps the most serious problem associated with the (in)ability of phrase structure syntax to acknowledge components is the frequent presence of phonologically null categories. Many modern PSGs often posit the presence of null categories. A brief look at any one of many textbooks on syntax written in the last 25 years verifies this point (e.g. Adger, 2003; Radford, 2004; Carnie, 2013). The syntactic analyses that one encounters often include phonologically null categories (e.g. a null tense category, a null determiner category, a null complementizer category, a trace, etc.). These phonologically null categories do not correspond to any entities in dependency syntax. Their presence muddies the waters concerning the word combinations that do and do not qualify as components. To restate and summarize the message, PSG can acknowledge the same word combinations as components as DG if it so chooses, which means that the value of the component unit extends beyond DG to PSG as well. However, acknowledging components necessitates that PSG assume structural analyses that many modern PSGs would be reluctant to adopt. The PSG at hand should allow ternary branching, avoid exocentricity as a matter of principle, and reject phonologically null categories. These issues do not arise for DG, because DG necessarily allows ternary branching, is incapable of acknowledging exocentricity, and readily rejects phonologically null categories (except in cases of ellipsis). More importantly, there would be no reason to assume phrase structures given these requirements, since the dependency structures that correspond to the relatively flat, entirely exocentric phrase structures are simpler and hence easier to produce and work with. 4. Methodology The following subsections establish the methodology of the study. The research claims, questions, materials, participants, and data collection and documentation are established.

4.1. Claims and questions Concerning the component's relation to chunking and vice versa, we make the following claims: Claim 1 Chunking is a basic mechanism of language by which words are grouped together to convey meaning. Native speakers of a language intuitively know how smaller units of language, i.e. words, are organized into bigger meaningful groups. These groups can be identified by simple chunking tests. Claim 2 Informants choose to chunk sentences in such a manner that the resulting chunks are DG components. Claim 3 Based on the reasoning from the component and chunking data, one can motivate or refute structural analyses of specific constructions. One can thus resolve areas of debate concerning the best hierarchical analyses. In the big picture, two simple, overarching research questions arise: Question 1 How many chunks produced by informants are DG components according to the preferred DG analyses? Question 2 How many chunks produced by informants are PSG constituents according to standard PSG analyses? We have studied and tested five major types of Chinese constructions in our efforts to answer these questions, the ones listed in the introduction above: 1) pivotal sentences, 2) serial verb constructions (or co-verbs), 3) descriptive and resultative V-de constructions, 4) bǎ constructions, and 5) bèi constructions. These five constructions are part and parcel of Chinese syntax. There is, however, an absence of clarity about the basic structural analyses of these constructions, especially from a DG perspective.

7 Consider this problem of exocntricity with respect to example (42) below, the phrase structure tree produced by the Stanford Parser of the sentence Dàjia ting le zhè gè xiaoxi feicháng gaoxìng ‘We heard the news and were very happy’. The erratic labeling of nodes in that tree renders some of the phrasal constituents present exocentric. In the absence of guidelines about how to interpret the labels, such a phrase structure tree cannot be translated to dependency because it does not consistently distinguish heads from dependents.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

8

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

4.2. Materials and participants The materials used to collect data from informants were handouts containing Chinese sentences. Each handout was designed in such a manner that it first included an introduction containing the notion of chunking and an illustration of how a sentence can be chunked, and it then listed eight to ten Chinese sentences, inviting the informant to divide each sentence into a set number of chunks. These sentences fall into the aforementioned five construction types. Sentences of each type were carefully chosen to cover as many subtypes as possible (see the Appendix), and filler sentences were also included to avoid the influence of similar sentences. The participants of the study were college students from a major university in China. They were widely distributed in their majors. These students were considered eligible participants because they are native speakers of Chinese who have a high level of literacy.

4.3. Data collection and documentation Altogether 52 sentences were tested in three major rounds of data collection in the year 2018. The first round, or the pilot test, collected informants’ general responses to the sentences. The second round of data collection elicited more meaningful data through manipulation of requirements, i.e. the number of dividers to put into each sentence. The third round of data collection verified the results obtained in the previous two rounds. The data were gathered in classrooms. After the handouts were collected from the informants, they were checked to see whether the chunking requirements were met. Handouts that contained illegitimate responses (due, perhaps, to misreading of the directions) were eliminated from documentation. The format we adopted to record informant responses is illustrated next: (10)

We hear LE this CLF news very happy a. Dàjia ting le zhè gè xiaoxi | feicháng gaoxìng. -- 42 b. Dàjia | ting le zhè gè xiaoxi feicháng gaoxìng. -- 2 ‘We heard the news and were very happy’.

The requirement in this case was to divide the sentence into two chunks by inserting one divider. The numbers to the right of the sentenc show that 42 informants chunked the sentence as shown in (10a) and two informants divided the sentence as in (10b). The assessment of the data collected took place in two steps. First, the competing potential DG analyses of the construction at hand are reviewed. Then, the preferred analysis of that construction is chosen based on the chunking results produced by the informants. The chosen analyses are referred to below as a whole as the ‘‘preferred analyses’’.

5. Candidate analyses and results This section reports on and analyzes the results obtained in the chunking tests. Sections 5.1--5.5 address the structural issues of the five Chinese constructions. For each construction, its features, major structural debates, and possible dependency analyses are presented first. Then, the chunking results for the sentences tested are analyzed and discussed to determine which analysis should be preferred based on the reasoning in terms of the component unit, that is, chunks should be components.

5.1. Pivotal sentences Jianyǔ jù ‘pivotal sentence’ is a special type of Chinese construction that has a jianyǔ ‘pivot’. The pivot is an NP that functions in two ways; it is a ‘‘double functional element’’. A typical pivotal sentence can be represented as NP1 + P1 + NP2 + P2, whereby the pivot is NP2, e.g. (11) NP1 Nǐ You ‘You

P1 NP2 P2 qǐng ta lái. (glossed from Liu et al., 1983: 448) invite he/him come invite him to come’.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

9

NP2 ta functions both as the object of P1 qǐng ‘invite’ (the person that is invited) and as the subject of P2 lái ‘come’ (the person that comes). The whole sentence can be viewed as a nested combination of a predicate--object structure and a subject--predicate structure, with NP2 playing two roles. General consensus about P1 and P2 in pivotal sentences is that P1 is the sentence root (e.g. Wang, 1985; Li, 2000). The/a point of debate concerns, rather, the hierarchical status of NP2. While the majority view it as the object of P1 (e.g. Ding et al., 1961; Zhu, 1982; Li, 1986), some believe that it is the subject of P2 (e.g. Fan, 1985). Difficulty in this area motivates PSG grammarians such as Xing (2004) to posit a null category e as the subject of P2. DGs tend to avoid null categories (except in the event of ellipsis), so an analysis in terms of a null category is not an option for the current approach. The issue is straightforward in DG terms: either NP2 is an immediate dependent of P1 or of P2:

In tree (12a) NP2 is a dependent of P1, whereas in tree (12b) NP2 is a dependent of P2. To determine which of these two analyses should be preferred, we tested seven pivotal sentences (which belong to four subcategories, cf. Liu et al., 1983 -all the sentences we tested, the subcategories they fall into, and the results of the data collection are provided together in the Appendix). If participants chunked NP2 with P1, the ternary analysis was supported. If, however, they chunked NP2 with P2, the small clause analysis was supported. The results we obtained were clear in this regard. They strongly support the ternary analysis. To illustrate the extent to which the informant data support the ternary analysis, we now examine the results for a couple examples. The first of these is the sentence Ta ràng wǒ jiao ta rìyǔ ‘He asks me to teach him Japanese’.8 Informants were invited to divide the sentence into three chunks: (13) a. b. c. c. d.

NP1 P1 NP2 P2 He ask I/me teach him Japanese Ta ràng wǒ | jiao ta | rìyǔ. Ta | ràng wǒ | jiao ta rìyǔ. Ta | ràng wǒ jiao ta | rìyǔ. Ta | ràng | wǒ jiao ta rìyǔ. Ta ràng | wǒ | jiao ta rìyǔ. ‘He asked me to teach him Japanese.’

------

26 16 4 1 1

The numbers on the right show how many informants chose to chunk the sentence in the manner indicated. 42 (=26 + 16) informants chose to chunk NP2 with P1 to the exclusion of P2, whereas only one informant chunked NP1 with P2 to the exclusion P1. These data therefore strongly support the ternary analysis. To express the point in terms of the component unit, 42 informants chunked the sentence in a manner that has (NP1+) P1 + NP2 in a chunk, a fact that supports the ternary analysis because on the ternary analysis, (NP1+)P1 + NP2 is a component. In contrast, 42 informants would have produced a chunk that is not a component on the small clause analysis. The next trees help to illustrate the point:

8

The format here has been adapted just for documentation and analysis; there was no space, no bold letters in the original test sentence.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

10

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

The strings Ta ràng wǒ, a chunk in (13a), and ràng wǒ, a chunk in (13b), are both components on the ternary analysis in (14a), but they are not components on the small clause analysis -- because NP2 wǒ in (14b) is not directly connected by a dependency to any other word in its chunk. The results for the other pivotal sentences we tested were similar, as suggested by the next example: (15) a. b. c. d. e.

NP1 P1 NP2 P2 I have CLF brother at Wǒ yǒu gè gege | zài Wǒ | yǒu gè gege | zài Wǒ yǒu gè | gege | zài Wǒ yǒu gè gege | zài | Wǒ | yǒu gè gege zài ‘I have a brother who teaches at

PU teach bêidà | jiaoshu. -bêidà jiaoshu. -bêidà jiaoshu. -bêidà jiaoshu. -bêidà | jiaoshu. -Peking University’.

28 15 3 1 1

Results (15a, b, d) show that 44 (=28 + 15 + 1) informants chunked NP2 with P1 to the exclusion of P2, supporting the ternary analysis. In contrast, no participant grouped NP2 with P2 to the exclusion of P1. Note that results (15c, e), in which NP2 was grouped with both or neither of the predicates, do not support or refute either analysis. Although the support for the analysis is strong, an anomaly did occur in the results for one of the pivotal sentences we tested: (16) a. b. c. d.

NP1 P1 Teacher praise Lǎoshi | biǎoyáng Lǎoshi | biǎoyáng | Lǎoshi biǎoyáng | Lǎoshi biǎoyáng ‘The teacher praises

NP2 P2 Xiaoming study conscientious Xiǎomíng | xuéxí rènzhen -- 28 Xiǎomíng xuéxí rènzhen -- 10 Xiǎomíng | xuéxí rènzhen -- 9 Xiǎomíng | xuéxí | rènzhen -- 1 Xiaoming because he studies conscientiously’.

29 participants, (16a, d), chunked NP2 with P1, supporting the ternary analysis. As shown in (16b), however, ten informants grouped NP2 with P2 to the exclusion of P1, which is in favor of the small clause analysis. Again, result (16c) does not refute or support either of the analyses. Taken as a whole, the results nevertheless support the ternary analysis because it is supported by the majority of the participants, whereby the small clause analysis is refuted more than it is supported. The conclusion for the pivotal sentence is therefore straightforward: The pivot, NP2, should be a dependent of the first predicate P1 to its left, rather than of the second predicate P2 to its right.9 5.2. Serial verb constructions Liándòngjù ‘serial verb constructions’ (SVCs), or co-verbs, are a type of sentence that has two or more verbs (or verb phrases) used in a series without a break in prosody (e.g. Yi, 2003). At first glance, SVCs are similar to the pivotal sentences just discussed: both constructions have a complicated predicate structure consisting of (at least) two verbs. On the other hand, the SVC is different from the pivotal sentence in that the shared argument in the SVC is the sentence subject NP1, rather than NP2, e.g. (17) NP1 P1 NP2 P2 NP3 Tamen zuò huǒche qù Shànghǎi. They take train go Shanghai ‘They take the train and go to Shanghai’, or ‘They go to Shanghai by train’. As the sentence subject, NP1 Tamen serves as the subject argument of both P1 and P2 (‘they take’ and ‘they go’). As there is no syntactic marking available for the specification of the relation between P1 and P2, the sentence can be

9 Our conclusion is consistent with the dominant stance among studies of Chinese sentence structure in terms of the so-called ‘‘tests for constituents’’. By using pause, adverbial addition, answer fragments and topicalization, scholars such as Ding et al. (1961: 119), Liu et al. (1983: 453) claimed that in the pivotal sentence, NP2 is more closely related to P1 than to P2, and thus should be attached to the former in the structure.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

11

understood in two ways, as indicated by the two translations. Note that NP2 and NP3 are not necessary in every SVC, i.e. their existence is determined by the properties of the predicate at hand. Thus, a typical SVC can be expressed as NP1 + P1 + (NP2) + P2 + (NP3). Due to its unique features, the SVC has been studied extensively. Apart from the scope or extent (Ding et al., 1961; Li and Thompson, 1974; Zhu, 1982) and the cross-linguistic or typological study of the construction (e.g. Baker, 1989; Paul, 2008), one hot area of research is concerned with the categorization of the SVCs based on the semantic relationship between P1 and P2.10 The focus here, however, is on the syntactic or hierarchical analysis of the SVC. Interestingly, only a few scholars have addressed the hierarchical status of the two predicates in the SVC (e.g. Zhu, 1985; Shi, 1986; Li, 2000). To the best of our knowledge, no agreement has been reached, nor has a comprehensive analysis been attempted. Within the current context of DG, the two competing analyses are given next:

To determine which of these two analyses should be preferred, we invited participants to divide ten SVC sentences into two chunks. The results we obtained support P1 as the root, that is, they support (18a) over (18b). For example: (19)

NP1 P1 NP2 We hear LE this CLF a. Dàjia ting le zhè gè b. Dàjia | ting le zhè gè ‘We heard the news and were

P2 news very happy xiaoxi | feicháng gaoxìng. -- 42 xiaoxi feicháng gaoxìng. -- 2 very happy’.

NP1 Dàjia ‘we’ is the subject argument of both P1 ting ‘hear’ and P2 gaoxìng ‘happy’. As a transitive verb, P1 also takes NP2 as its object argument. The data reveal that P2 cannot be the root because if it were, the string NP1 + P1 + NP2, produced by 42 informants, would not be a component. If P1 is the root, however, then NP1 + P1 + NP2 can be granted the component status. Hence, sentence (19) should have the following dependency structure:

Many of the other SVC sentences we tested received similar results: While there were some anomalies, the majority of the informants chunked NP1 with P1, which is supportive of the analysis in (18a). However, one type of SVC sentence was chunked in a manner that was inconsistent with the results for the other SVC sentences:

10 Following Liu et al. (1983) and Hu (2003), we divide SVCs into five subcategories. These categories are not discussed here, but are listed in the Appendix.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

12

(21)

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

NP1 P1 NP2 P2 NP3 Chinese use chopsticks eat meal a. Zhongguórén | yòng kuàizi chi fàn. b. Zhongguórén yòng kuàizi | chi fàn. c. Zhongguórén yòng | kuàizi chi fàn. ‘Chinese use chopsticks to eat’ or ‘Chinese eat

-- 43 -- 4 -- 3 using chopsticks’.

NP1 is the subject argument of both predicates. P1 can be understood as denoting the manner of P2, or how that event is accomplished. Result (21a) shows that 43 participants took NP1 as a single chunk. This does not support or refute either of the structural analyses, since P1 + NP2 + P2 + NP3 is a component on both analyses. Result (21b), which has NP1 + P1 + NP2 as a chunk, supports P1 as the root because if P2 were the root, that string would not be a component. Result (21c) refutes both analyses, since NP2 + P2 + NP3 is a non-component regardless of whether P1 or P2 is the root. Therefore, the results as a whole suggest P1 as the root, though only by a narrow margin. The anomalous nature of the chunking results for example (21) points to an additional factor influencing how informants choose to chunk sentences, i.e. prosodic length and weight. The one-word chunk in (21a) is prosodically heavy, having three syllables. We re-tested a revised version of sentence (21) that has a one-syllable NP1 to determine the influence of prosodic weight on chunking: (22)

NP1 P1 NP2 P2 NP3 I use chopsticks eat meal =(1) above a. Wǒ yòng kuàizi | chi fàn. -- 24 b. Wǒ | yòng kuàizi chi fàn. -- 18 c. Wǒ yòng | kuàizi chi fàn. -- 2 ‘I use chopsticks to eat’, or ‘I eat using chopsticks’.

In this case, more informants chunked NP1 with P1 rather than leaving it alone as chunk, suggesting (18a) as the preferred analysis. The message here is that informants are more inclined to chunk NP1 with P1 when it is prosodically weak. When NP1 is prosodically strong as it is with Zhongguórén ‘Chinese’ in (21), however, it is more likely to be grouped alone. The results are consistent with the findings of Grosjean et al. (1979: 80), who demonstrate that prosodic weight, such as length and stress, has an impact on sentence processing. In conclusion, two-predicate serial verb constructions are such that P1 should be viewed as the sentence root over P2. 5.3. Descriptive and resultative V-de constructions (VDCs) De zì jù ‘de construction’ is a variety of complex predicate structures that have a de following the first predicate (P1) in the sentence. Based on meaning, the construction can be divided into three types, namely descriptive, resultative, and modal (e.g. Chen, 2001). Among them, the descriptive and resultative V-de constructions (abbreviated as VDC) enjoy wider use and have attracted the attention of many linguists (e.g. Li, 1990; Huang et al., 2009; Li, 2015). Because P2 often denotes the state, extent, or result of the action expressed by P1, the VDC is also called the ‘stative construction’ by some grammarians (e.g. Li and Thompson, 1981). Sentences (23a) and (23b) illustrate these two types of VDCs, respectively: (23) a. NP1 P1-de P2 Ta ku de hên shangxin. She cry DE very sad(ly) ‘She cries so sadly’. b. NP1 P1-de NP2 P2 Ta ku de yǎnjing dou hóng le. He cry DE eye even red LE ‘He cried so hard that his eyes were red’. The structure of a descriptive VDC as shown in (23a) can be shortened as NP1 + P1-de + P2.11 NP1 Ta is the subject of P1 ku; and P2 shangxin either describes NP1 (she is sad) or P1 (cries sadly). The resultative VDC as given in (23b),

11 There has been controversy about the status of de, that is, de has been regarded as a preposition, a suffix, or a structural auxiliary verb (e.g. Zhu, 1982; Li, 2000). Since it is not the focus of the study here, the discussion simply takes de as a function word that cliticizes to P1. It is glossed as P1-de.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

13

however, has an intervening NP (NP2) between the predicates. P2 hóng does not modify NP1 or P1, but rather it is predicated over NP2 yǎnjing. NP2 + P2 together denote the result caused by the action or event expressed by P1.12 Concerning the structural analysis of the VDC, one controversy is the status of the two predicates, i.e. whether P1 or P2 is the main predicate of the sentence. The discussion here takes Huang (1988)’s widely-adopted secondary predication hypothesis for granted, as opposed to the first predication hypothesis (e.g. Chao, 1968; Tsai, 2012), and views P1 as the root of the sentence over P2. Thus, sentence (23a) has the following dependency structure:

While the analysis for the descriptive VDC is straightforward, one thorny issue exists for the resultative VDC as shown in (23b), that is, whether the intervening NP (or NP2) is a dependent of P1 or P2. Although scholars have produced different arguments for each stance, a consensus has not been reached because studies mostly adopted traditional methods that sometimes yield inconsistent results (e.g. Zhu, 1982; Huang et al., 2009; Li, 2015). It is therefore warranted to use the component unit of DG and chunking data to address this issue. The competing DG analyses of the VDC construction are visualized as follows:

In order to test whether NP2 should be a dependent of P1 as shown in (25a) or a dependent of P2 as shown in (25b), we invited native informants to chunk ten V-de sentences of different semantic types. The results showed a clear division among sentences with different P1s. If P1 is transitive, the tendency is to chunk NP2 with P1, supporting the analysis in (25a). The next example demonstrates the sort of data we received in this area: (26)

NP1 P1-de NP2 P2 Landowner force DE he/him no-way-could-go a. Dìzhǔ | bi de ta | wú-lù-kê-zǒu. -- 37 b. Dìzhǔ | bi de | ta wú-lù-kê-zǒu. -- 3 c. Dìzhǔ bi de | ta | wú-lù-kê-zǒu. -- 3 ‘The landowner has forced him into a desperate situation’.

NP2 ta is the patient of P1 bi and the experiencer of P2 wú-lù-kê-zǒu. 37 informants chunked P1-de with NP2, which is in favor of the ternary analysis that views NP2 as a dependent of P1-de. Note that results (26b) and (26c), which were produced by three participants respectively, do not support or refute either analysis. The conclusion remains the same as long as P1 is used transitively. For instance, chunking data for a sentence in which NP2 is an agent of P1 are as follows:

12 In most VDCs, P2 is a verb or a predicative adjective as shown in (23b). However, linguists have found that some prepositions, e.g. hên ‘very’, lìhài ‘severely’, and a few NPs, e.g. yìshen lênghàn ‘whole body's cold sweat’, could also appear in the position of P2, adding extreme degree or details to P1 (e.g. Fan, 1992: 6; Li, 2015: 14).

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

14

(27)

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

NP1 P1-de NP2 P2 This CLF class listen DE I confused a. Zhè táng kè | ting de wǒ | húlihútu. -- 35 b. Zhè táng kè | ting de | wǒ húlihútu. -- 4 c. Zhè táng kè | ting de | wǒ | húlihútu. -- 4 ‘This session of class has made me so confused’, or ‘I was so confused after listening to this session of class’.

NP2 wǒ is the agent of P1 ting and an experiencer of P2 húlihútu. The majority of the informants again chunked in a manner that is supportive of the ternary analysis, as displayed by (27a). Note that (27b) is contradictory to the preferred analysis, because NP2 + P2 is not a component given ternary structure. However, these anomalies are not considered a problem, because the ternary analysis is supported by most of the informants. For VDCs with an intransitive P1, i.e. NP2 is an argument of P2 only, the results support the small clause analysis in which NP2 is a dependent of P2, e.g. (28) a. b. c. d.

NP1 This Zhè Zhè Zhè Zhè ‘This

P1-de NP2 P2 child grow DE I even not recognize LE háizi | zhǎng de | wǒ dou bú rènshi le. -- 30 háizi zhǎng de | wǒ dou | bú rènshi le. -- 7 háizi zhǎng de | wǒ | dou bú rènshi le. -- 5 háizi | zhǎng de wǒ | dou bú rènshi le. -- 1 child has grown so much that I could not even recognize’.

NP2 wǒ is not semantically related to P1 zhǎng. It is an argument of P2 rènshi only. As shown in (28a), 30 informants chunked NP2 with P2, supporting the small clause analysis. Note that result (28d), which was produced by only one informant, refutes this analysis because P1 + NP2 is a non-component in (25b), but again, such anomalies are not a challenge to the conclusion reached. Results for the other sentences in which NP2 is an argument of P2 are consistent, i.e. while there are some contradictory results, the data as a whole support the small clause analysis. The conclusion so far is thus clear: the intervening NP, if one is present, of VDC should be viewed as a dependent of P1 if it is an argument of P1. If it is an argument of P2 only, however, it should then be a dependent of P2. In addition to this flexible analysis we are proposing for the VDC with one intervening NP, another special type of VDC has attracted our attention, one in which two NPs intervene between P1 and P2, e.g. (29) P1-de NP1 NP2 P2 Xià de ta kùzi dou shi le. (Li, 1986: 336; glossing ours) Scare DE he/him pants even wet LE ‘(It) scared him so much that he wet his pants’. The sentence subject is habitually omitted in such cases, hence NP1 now follows P1-de. Note that there are two intervening nominals in (29), the first NP1 ta and the second NP2 kùzi. Without close examination, NP1 can be viewed as the possessive modifier of NP2, meaning ‘his pants’, which makes it natural to assume NP1 as the dependent of NP2 in the structure. However, the relation may not be real because first, a Chinese possessive noun generally needs to be followed by the adjective marker de (的), and second, the possessive reading overlooks the semantic relation between NP1 and P1, and between NP2 and P2. To the best of our knowledge, however, this structure has only been briefly mentioned without a systematic analysis having been proposed (e.g. Li, 1986; Huang, 1988). Three analyses are possible and worthy of consideration in the current DG:

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

15

To test which of these three analyses should be preferred, we asked informants to chunk altogether eight sentences. These eight sentences can be divided into two subcategories based on the semantic role that NP1 plays. The data obtained suggest (30c) as the best analysis, as there is a tendency to chunk between NP1 and NP2, e.g. (31) a. b. c. d.

P1-de Cry DE Ku de Ku de | Ku de Ku de ‘He cried so

NP1 NP2 he/him eye ta | yǎnjing ta yǎnjing ta yǎnjing ta yǎnjing | hard that his eyes

P2 even red dou hóng dou hóng dou | hóng dou hóng were red’.

LE le. le. le. le.

-----

38 2 2 2

NP1 ta is the experiencer of P1 Ku. 38 informants produced the chunks P1-de + NP1 and NP2 + P2, both of which are components only on analysis (30c). Note the that results (31b, c, d), which were produced by two informants respectively, refute (30c), because NP1 and NP2 are not linked by dependencies on that analysis. But again, the relatively small number of informants who produced contradictory chunks does not endanger the conclusion based on the results from a large majority of the informants. Results for other VDCs with a patient NP1 were similar. Chunking responses for a sentence in which NP1 is the patient of P1 are given as follows: (32)

P1-de Scold DE a. Mà de b. Mà de | c. Mà de ‘(Someone)

NP1 NP2 P2 Xiaoming face even Xiǎomíng | liǎn dou Xiǎomíng liǎn dou Xiǎomíng liǎn | dou scolded Xiaoming so hard

green LE lü le. -- 39 lü le. -- 3 lü le. -- 2 that Xiaoming was blue in the face’.

39 informants chunked the sentence in a manner that is supportive of analysis (30c) and against the other analyses. In summary, the resultative V-de constructions with two intervening nouns should have the structure shown in (30c), in which the first intervening NP (NP1) is a dependent of P1, and the second intervening NP (NP2) a dependent of the second predicate P2. 5.4. Bǎ and bèi constructions The bǎ and bèi constructions are two unique and widely-used constructions in Mandarin Chinese. The bǎ construction is also referred to as the ‘‘disposal’’ form, the ‘‘causative’’ form, or the ‘‘resultative’’ form (e.g. Wang, 1985; Huang et al., 2009), whereas the bèi construction is also called the ‘‘passive form’’ (e.g. Wang, 1985; Her, 2009). Many agree that the function of bǎ is to move (or ‘‘promote’’) the postverbal object to a position in front of the verb, and that the function of bèi is to move the postverbal object to the sentence-initial position (e.g. Li, 1986). Using labels to abbreviate the words, a typical bǎ or bèi construction can be expressed as NP1 + BA/BEI + NP2 + VP.13 The bǎ and bèi constructions are illustrated with the following examples: (33) a. NP1 BA NP2 V Wǒ bǎ píngguǒ chi le I BA apple eat LE ‘I ate the apple’. b. NP1 BEI NP2 V Píngguǒ bèi wǒ chi le. Apple BEI I eat LE ‘The apple was eaten by me’. NP1 Wǒ in (33a) is the agent of the verb chi (I ate), whereas NP2 píngguǒ after bǎ is the patient of the verb (the apple was eaten). NP1 in (33b), however, is the patient of the verb, whereas NP2 following bèi is the agent (as in an English passive

13 According to the popular division between the long and short passive sentences (c.f., Huang, 1999; Her, 2009), the bèi constructions discussed here are all long passives, where an intervening NP (NP2) is present between bèi and the sentence-final verb.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

16

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

sentence). Both sentences can be understood as expressing how the apple was disposed of (it was eaten), or how it was affected by the agent, though involuntarily (it was gone). At the same time, it can be inferred from (33a) and (33b) that the bǎ and bèi constructions are not only parallel in structure but also opposites in meaning in a sense: the active bǎ construction can be changed to the passive bèi construction by switching the NPs before and after bǎ and bèi. With respect to the syntactic status of bǎ and bèi, the traditional view takes them as prepositions that are subordinate to the sentence-final verbs (e.g. Chao, 1968; Zhu, 1982; Li, 2000). More recently, however, a number of grammarians have been arguing that bǎ and bèi are actually verbs and should be taken as the root verb of the corresponding construction (e. g. Li, 1986; Bender, 2000; Paul, 2015). The controversy is illustrated using the following dependency structures:

A related question concerns the status of NP2, the NP immediately following BA/BEI. While some linguists believe that NP2 is the syntactic object of BA/BEI (e.g. Zhu, 1982), others propose a small clause analysis in which NP2 is the syntactic subject of the verb (e.g. Huang et al., 2009):

The stance adopted here is that the component unit and chunking data can decide these issues. The two areas of debate just sketched combine to result in four basic possibilities for the structural analysis of the bǎ and bèi constructions:

COMP stands for complement of the verb, which is usually a tense marker, an adverbial, or an object. Analyses (36a--b) have the sentence-final verb as the sentence root, whereas (36c--d) show BA/BEI as the root. The competing views concerning NP2 are also visible. Analyses (36a, c) have NP2 is the object complement of BA/BEI, whereas (36b, d) have it as the subject of V. To determine which of the four analyses is best, we tested eight bǎ and eight bèi sentences. Informants were invited to divide each sentence into three chunks. For bǎ sentences, the results clearly support analysis (36c). Informants took BA + NP2 or NP1 + BA as a chunk. The chunking responses we received for sentence (33a) are given next: (37)

NP1 I a. Wǒ | b. Wǒ c. Wǒ | ‘I ate

BA NP2 V BA apple eat LE bǎ píngguǒ | chi le. -- 26 bǎ | píngguǒ | chi le. -- 14 bǎ | píngguǒ chi le. -- 4 the apple’.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

17

26 informants judged BA + NP2 to be a chunk, supporting the two analyses that view the two as forming a component, i.e. (36a, c). 14 of the informants took NP1 + BA to be a chunk, supporting the two analyses that view these two as forming a component, i.e. (36c, d). Hence by elimination, (36c) is the best analysis, since only it views both BA + NP2 and NP1 + BA as components. The results for the other bǎ sentences we tested were consistent, that is, the majority of the informants chose to chunk the sentences in a manner that takes BA + NP2 or NP1 + BA as a chunk. The next example is a bǎ sentence that has a noun complement (NP3): (38) a. b. c. d.

NP1 BA He BA Ta | bǎ Ta bǎ | Ta bǎ Ta | bǎ | ‘He peeled

NP2 V orange peel júzi | bo júzi | bo júzi | bo júzi bo the orange’.

LE le le le | le

NP3 peel pí. pí. pí. pí.

-----

28 14 1 1

NP3 pí is in a possessive relationship with NP2 júzi ‘orange's peel’, although it is a complement of the verb bo. As shown by (38a, b, c), 43 informants (=28 + 14 + 1) produced BA + NP2, NP1 + BA, or NP1 + BA + NP2 as chunks. The one structural analysis that accommodates all three cases has BA as the root and NP2 as its dependent, i.e. (36c). Turning now to the bèi construction, the results we obtained were less clear. An overwhelming majority of informants took BEI + NP2 as a chunk, whereas very few of them judged NP1 + BEI to be a chunk. Based on these results, it is possible to firmly reject two of the structural analyses, (36b, d), but it does not allow one to choose clearly between the remaining two, (36a, c). The following example illustrates these points: (39)

NP1 Apple a. Píngguǒ | b. Píngguǒ c. Píngguǒ | ‘The apple

BEI NP2 V BEI mouse eat LE bèi lǎoshǔ | chi le. -- 41 bèi | lǎoshǔ | chi le. -- 2 bèi | lǎoshǔ chi le. -- 1 was eaten by the mouse’.

41 informants took BEI + NP2 as a chunk, supporting the two analyses that view BEI + NP2 as a component, i.e. (36a, c). Only two informants judged NP1 + BEI to be a chunk, a fact that only very weakly supports (36c) over (36a). On the whole, (36c) is the best analysis, but only by a very narrow margin. The results we received for the other bèi sentences were similar, that is, we found only a very slight preference for the analysis that positions BEI as the sentence root over the alternative analysis, which positions BEI as a dependent of the verb, i. e. (36c) over (36a). To further elaborate, responses for a bèi sentence with an NP complement (NP3) are given as follows: (40)

NP1 BEI Table BEI a. Zhuozi | bèi b. Zhuozi bèi | c. Zhuozi bèi | ‘One corner of

NP2 V carpenter saw LE mùjiàng | jù le mùjiàng | jù le mùjiàng jù le | the table was sawed

NP3 one yí yí yí off by

CLF corner gè jiǎo. -- 40 gè jiǎo. -- 3 gè jiǎo. -- 1 the carpenter’.

A huge majority of the informants judged BEI + NP2 to be a chunk, which is, again, supportive of analyses (36a, c). Only three participants took NP1 + BEI to be a chunk, a fact that allows for only a very slight preference in favor of (36c) over (36a). In summary, the Mandarin bǎ construction is such that bǎ should be viewed as an auxiliary verb that functions as the root of the sentence, with NP2 as its immediate dependent. The analysis of the Mandarin bèi construction can be the same, although the data concerning the bèi construction allow for only a very tentative conclusion to this effect. 6. DG component vs. PSG constituent This section scrutinizes the DG component in comparison to the PSG constituent in helping to predict how informants choose to chunk sentences. Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

18

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

6.1. Calculating chunks, DG components, and PSG constituents We calculated the total number of chunks produced by informants, the number of these chunks that qualify as DG components (and non-components), and the number of them that qualify as PSG constituents (and non-constituents). Based on the results of these calculations, the value of the DG component in predicting how informants choose to chunk sentences becomes entirely apparent. To illustrate, example (19) from above is repeated here as (41). As in Section 3.2, the capital letters are used to abbreviate the words:

The first numbers that are relevant here concern the number of component and non-component strings that are present given the analysis shown, which is the preferred analysis based on the reasoning discussed and illustrated at length in Section 5. These numbers are as follows: 18 distinct component strings in (41) A, B, C, D, E, F, G, H, AB, BC, EF, GH, ABC, DEF, BCDEF, ABCDEF, BCDEFGH, and ABCDEFGH 18 distinct non-component strings in (41) CD, DE, FG, BCD, CDE, EFG, FGH, ABCD, BCDE, CDEF, DEFG, EFGH, ABCDE, CDEFG, DEFGH, BCDEFG, CDEFGH, and ABCDEFG Due to the relatively flat structure, the number of non-component strings equals the number of component strings in this example. The numbers just given for the DG component should be compared with the corresponding numbers for the PSG constituent. To arrive at the number of PSG constituents for the 52 sentences, we put each of them into the Stanford Parser (Chinese PCFG.ser.gz).14 We obtained the following PSG analysis of the current example:

The distinct PSG constituents on this analysis are listed next:

14 Although there are incorrect labels and inconsistent analyses for the same type of constructions and the exocentric structures adopted by the PSG trees make it hard to identify head-dependent relation, we hold the opinion that the trees produced by the Stanford Parser are good enough for our use, that is, for the calculation of constituents.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

19

14 distinct PSG constituents in (42) A, B, C, D, E, F, G, H, DE, GH, DEF, DEFGH, BCDEFGH, and ABCDEFGH Comparing the numbers, we again see that the DG component is more inclusive than the PSG constituent (19 DG components vs. 14 PSG constituents). Turning now to the chunks produced, the results are repeated as follows: A B C D E F G H (43) a. Dàjia ting le zhè gè xiaoxi | feicháng gaoxìng. -- 42 b. Dàjia | ting le zhè gè xiaoxi feicháng gaoxìng. -- 2 Four of the strings were chosen as chunks by the informants: ABCDEF, GH, A, and BCDEFGH. All four of these strings are DG components on the analysis given as (41), which means that the DG component is 100% accurate in this case. The PSG constituent, in contrast, is only 52% accurate because of the 88 (=44  2) chunks produced by the 44 informants, only 46 of them qualify as PSG constituents -- the chunk ABCDEF is not a PSG constituent on the analysis shown in (42). This problem with the PSG constituent repeated itself in the informant responses we collected for many of the tested sentences; the initial chunk chosen by the informants was often not a PSG constituent. Note that the number of non-component strings in the example is also important, i.e. 18. If every string were a component, the component unit would of course have no value at all in predicting how informants choose to chunk sentences. The fact, however, that a significant number of strings fail to qualify as components highlights the predictive value of the component unit. The informants had ample opportunity to chunk the sentence in a manner in which the chunks would not have been components. For example, one might have expected them to have chunked the current sentence as follows: (44) Dàjia ting le zhè gè | xiaoxi feicháng gaoxìng -- 6 syl. | 6 syl. We hear LE this CLF news very happy This analysis would have made sense based upon the prosodic weight of the chunks, each of the two consisting of six syllables. The fact that not a single informant chose to chunk the sentence in this manner, however, points to two insights: the first being that prosodic weight is limited in its value as a predictor of chunking and second, that the two chunks shown in (44) should not be construed as DG components, and indeed, both are non-component strings on the DG analysis in (41). Finally, note that each individual word is by definition both a DG component and PSG constituent. Given this equivalence, single-word strings can be excluded from calculation in order to increase the discrepancies in the calculated numbers and thus to more clearly establish the predictive potential of the two units in comparison to each other. The numbers presented in the next two sections adhere to this practice of excluding single-word strings from the calculations. 6.2. Predictive value Adopting the manner of calculation as illustrated above consistently, we obtained the number of two-or-more-word chunks, the number of these chunks that are DG components, and the number of them that are PSG constituents for each of the 52 tested sentences. These numbers are summarized in Fig. 1. The figure shows that for most of the sentences, the number of component chunks approaches the number of chunks, and in certain cases, all the chunks produced by informants were components. In contrast, the number of PSG constituent chunks is much less, often approximately half the number of chunks in general. To quantify the relation between these two units and chunks, we performed a Pearson correlation test for chunks and components and for chunks and constituents, respectively. The results are summarized in Tables 1 and 2. Where N is the sample size, r is the correlation coefficient, and p is the significance value. When the value of r is between 0 and 1, the greater the value, the larger the extent to which the two variables are positively correlated. As for p, when p is less than 0.05, the relation is significant (as indicated by *); when p is less than 0.01, the relation is highly significant (as shown by **). The value of r and p in the two tables demonstrate that chunks enjoy a higher correlation with the component than with the constituent, pointing to the component as the better predictor of chunking. By adding up the number of the three units for each sentence, we arrived at the total number of chunks, components and constituents for all the tested sentences. Altogether 4363 multi-word chunks were produced by our informants. Of these, 4117 are DG components on the preferred analyses (246 are non-components), and 1990 are PSG constituents on the analyses from Stanford Parser (2373 are non-constituents). These results are shown using the following pie charts. Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

20

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

Fig. 1. Number of multi-word chunks, multi-word DG components, and multi-word PSG constituents.

Table 1 Correlation between chunks and components. Correlation

Chunk Pearson correlation (r) Significance ( p) N

Component

1

.865 ** .000 52

52

Component Pearson correlation (r) Significance ( p) N **

Chunk

.865 ** .000 52

1 52

Correlation is significant at the level of 0.01 (both sides).

Table 2 Correlation between chunks and constituents. Correlation

Chunk Pearson correlation (r) Significance ( p) N Constituent Pearson correlation (r) Significance ( p) N *

Chunk

Constituent

1

.282 * .043 52

52 .282 * .043 52

1 52

Correlation is significant at the level of 0.05 (both sides).

Fig. 2 shows that 94% of the chunks produced are DG components, whereas only 6% of them are not components. In contrast, Fig. 3 shows that only 46% of the chunks produced by informants are PSG constituents, whereas 54% are nonconstituents. The data therefore demonstrate that the DG component is much more suited to serve as the basis for predicting how informants choose to chunk sentences than the PSG constituent. When considering the appropriateness of the component unit to serve as the basis for predicting chunking, one should not lose sight of the fact that many multi-word strings fail to qualify as components, as established above with example (40). The next pie chart shows the total number of multi-word component and non-component strings for the 52 sentences on the preferred analyses (Fig. 4). Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

21

Fig. 2. Ratio of component chunks to non-component chunks.

Fig. 3. Ratio of constituent chunks to non-constituent chunks.

All told, there are 480 multi-word component strings on the preferred analyses and 269 non-component strings. The particular value of the component is clearly visible in these numbers. The component unit allows one to easily exclude more than a third of the multi-word strings as candidates to be chunks. Observe also in this regard that the percentage of multi-word non-component strings would be much higher in general on the analyses above that were rejected. That informants are more inclined to chunk in a manner that the resulting chunks are components rather than noncomponents may have a deeper psychological motivation. As mentioned in Section 2, chunking is an information processing mechanism that is limited by working memory (e.g. Miller, 1956; Chekaf et al., 2016). Due to this general cognitive constraint, humans tend to group adjacent smaller units into bigger units in order to facilitate processing. In linguistics, it has been found that proper chunking based on syntactic relations can significantly reduce the cost of processing sentences (Lu et al., 2016: 33). In the current experiment, which simulated the natural chunking process, informants preferred to chunk in a manner that the resulting chunks are components because components are syntactically related strings which can generate meaning with the least effort. On the other hand, informants were reluctant to produce non-component chunks, because non-component chunks are composed of hierarchically unrelated words, Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

22

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

Fig. 4. Ratio of component strings to non-component strings on the preferred analyses.

which takes more effort to form meaning (long-distance association may be needed). In other words, two syntactically related words (a head and its dependent) are not likely to appear very far away from each other because it increases the load on the already limited capacity of working memory. This stipulation is supported by quantitative studies that identified a minimization of dependency distance and a poverty of non-projective sentences across languages (e.g. Liu, 2008; Futrell et al., 2015; Liu et al., 2017). 7. An anomaly One anomaly in our data should be mentioned before concluding this article; it concerns the 52nd sentence in the Appendix (see Fig. 1 in Section 6.2), which contains a resultative predicate. That sentence, the dependency structure we assigned to it, and the informant responses we obtained for it are given next:

On the dependency analysis we assumed for this sentence, which is the one shown here, the string mǎn le dongxi is not a component. This is problematic because 37 informants chose to chunk the sentence in such a manner that mǎn le dongxi is a chunk. An alternative analysis that could overcome the problem would subordinate dongxi to mǎn, rendering mǎn le dongxi a component. We did not adopt this alternative analysis, however, because the presence of the NP dongxi is possible only by virtue of the presence of the verb sai, a fact that suggests that sai immediately governs dongxi, meaning that dongxi should be an immediate dependent of sai as shown in the tree. While the analysis of resultative constructions does not impact the conclusions reached concerning the five constructions that this study has focused on, it certainly constitutes a problem worthy of future research. 8. Summary and conclusion The discussion above has identified two main factors that influence how the speakers of a language, in this case Mandarin, group words into chunks. The first and more important of the two is syntactic structure. Speakers prefer to group Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

23

dependent words together with their heads, as opposed to together with words that do not include their head. The second, lesser factor is the prosodic weight of the resulting chunks -- see examples (21) and (22) and the surrounding discussion. There is a tendency for informants to produce chunks that are approximately equal in prosodic weight, whereby prosodic weight is primarily a function of the number of syllables. Concerning the first, more important of these two factors, the current account has developed the insight in terms of the component unit of dependency grammar (DG). A component is a string of words that are all linked together by dependencies. The reasoning based on the component motivates the claim that appears as the title of this article, namely that chunks are components. A related point established at length above is that chunks are often NOT constituents. Given the methodology employed above, it has been possible to discern the most plausible DG analyses of five central constructions in Mandarin syntax. These analyses are summarized next: (1) For Jianyǔ jù ‘pivotal sentences’, the noun phrase (NP2) between the predicates should be a dependent of the predicate to its left (P1), which is also the main predicate (or root) of the sentence. (2) For Liándòng jù ‘serial verb constructions’ or ‘co-verbs’, the first predicate on the left (P1) should always be the root of the sentence over the predicate on the right (P2), regardless of the semantic relation between the two predicates. (3) For dezì jù ‘descriptive and resultative V-de constructions’ with one intervening NP (NP2) between the two predicates, NP2 should be viewed as a dependent of the predicate to its left (P1), which is also the sentence root; for descriptive and resultative V-de constructions with two intervening NPs between the two predicates, the first intervening NP (NP1) is a dependent of the predicate to its left (P1), and the second intervening NP (NP2) is a dependent of the predicate to its right (P2). (4) For bǎzì jù ‘the bǎ construction’, bǎ should be the root of the sentence over the sentence-final verb and the NP that immediately follows bǎ should be a dependent of bǎ. The same analysis is advocated for bèizì jù ‘the bèi construction’, although our recommendation for the bèi construction is only tentative due to an absence of robustness in the informant responses. A final comment concerns the potential to reproduce the methodology we have employed for other languages. This methodology can easily be reproduced in other languages, languages for which DG studies are absent. Conflicts of interest The authors declare no conflicts of interests that may inappropriately influence the submission. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.lingua. 2019.03.003. References Abney, S., 1992. Parsing by chunks. In: Berwick, R., et al. (Eds.), Principle-based Parsing. Kluwer Academic Publishers, Dordrecht, pp. 257--278. Adger, D., 2003. Core syntax: A Minimalist Approach. Oxford University Press, Oxford, UK. Ágel, V., Eichinger, L.M., Eroms, H.W., Hellwig, P., Heringer, H.J., Lobin, H. (Eds.), 2003. Dependency and Valency: An International Handbook of Contemporary Research, vol. 1. de Gruyter, Berlin. Ágel, V., Eichinger, L.M., Eroms, H.W., Hellwig, P., Heringer, H.J., Lobin, H. (Eds.), 2006. Dependency and Valency: An International Handbook of Contemporary Research, vol. 2. de Gruyter, Berlin. Anderson, J., 2011. The Substance of Language: The Domain of Syntax, vol. 1. Oxford University Press, Oxford. Baker, M.C., 1989. Object Sharing and Projection in Serial Verb Constructions. Linguist. Inquiry 20 (4), 513--553. Bender, E., 2000. The syntax of Mandarin bǎ: reconsidering the verbal analysis. J. East Asia Linguist. 9 (2), 105--145. Bröker, N., 1999. Eine dependenzgrammatik zur kopplung heterogener wissensquellen. Niemeyer, Tübingen. Carnie, A., 2013. Syntax: A Generative Introduction. Wiley-Blackwell, Malden, MA. Chao, Y., 1968. A Grammar of Spoken Chinese. University of California Press, Berkeley. Chase, W.G., Simon, H.A., 1973. Perception in chess. Cognit. Psychol. 4 (1), 55--81. Chekaf, M., Cowan, N., Mathy, F., 2016. Chunk formation in immediate memory and how it relates to data compression. Cognition 155, 96--107. Chen, H., 2001. Dezi buyu jiegou xin tan [A new inquiry into the Chinese V-DE complement constructions]. J. Chin. People Liberation Army Univ. Foreign Lang. 24 (2), 56--60. Ding, S., Lv, S., Li, R., Sun, D., Guan, X., Fu, J., Huang, S., Chen, Z., 1961. Xiandai hanyu yufa jianghua [Lectures on Grammar of Modern Chinese]. Commercial Press, Beijing. Engel, U., 1994. Syntax der deutschen Gegenwartssprache, 3rd ed. Erich Schmidt Verlag, Berlin. Eroms, H.W., 2000. Syntax der deutschen Sprache. de Gruyter, Berlin.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003

+ Models

LINGUA-2680; No. of Pages 24

24

R. Niu, T. Osborne / Lingua xxx (2018) xxx--xxx

Fan, J., 1985. Wu ding NP zhuyu ju [Sentence with indefinite NP as the subject]. Stud. Chin. Lang. 5, 321--328. Fan, X., 1992. V de ju de de hou chengfen [The part following de in the V-de construction]. Chin. Lang. Learn. 6, 5--8. Futrell, R., Mahowald, K., Gibson, E., 2015. Large-scale evidence of dependency length minimization in 37 languages. Proc. Natl. Acad. Sci. U.S. A. 112 (33), 10336--10341. Gee, J.P., Grosjean, F., 1983. Performance structures: a psycholinguistic and linguistic appraisal. Cognit. Psychol. 15 (4), 411--458. Gobet, F., Lane, P.C.R., Croker, S., Cheng, P.C.-H., Jones, G., Oliver, I., Pine, J.M., 2001. Chunking mechanisms in human learning. Trends Cogn. Sci. 5 (6), 236--243. Grosjean, F., Grosjean, L., Lane, H., 1979. The patterns of silence: performance structures in sentence production. Cognit. Psychol. 11 (1), 58--81. Groß, T., 1999. Theoretical Foundations of Dependency Syntax. Iudicium, München. Hays, D., 1964. Dependency theory: a formalism and some observations. Language 40 (4), 511--525. Hellwig, P., 2003. Dependency unification grammar. In: Ágel, V., et al. (Eds.), Dependency and Valency: An International Handbook of Contemporary Research, vol. 1. de Gruyter, Berlin, pp. 593--635. Her, O.-S., 2009. Unifying the long passive and the short passive: on the bei constructions in Taiwan Mandarin. Lang. Linguist. 10 (3), 421--470. Heringer, H., 1996. Deutsche Syntax: Dependentiell. Stauffenburg, Tübingen. Hu, Y., 2003. Xiandai hanyu [Modern Chinese], 6th ed. Shanghai Educational Publishing House, Shanghai. Huang, J.C.-T., 1988. Wǒ pǎo de kuài and Chinese phrase structure. Language 2, 274--311. Huang, J.C.-T., 1999. Chinese passives in comparative perspective. Tsing Hua J. Chin. Stud. 29, 423--509. Huang, J.C.-T., Li, Y.-H.A., Li, Y., 2009. The Syntax of Chinese. Cambridge University Press, Cambridge. Hudson, R., 1984. Word Grammar. Basil Blackwell, Oxford. Hudson, R., 1990. An English Word Grammar. Basil Blackwell, Oxford. Hudson, R., 2007. Language Networks: The New Word Grammar. Oxford University Press, Oxford. Hudson, R., 2010. An Introduction to Word Grammar. Cambridge University Press, Cambridge, UK. Kern, F., 1883. Zur Methodik des deutschen Unterrichts. Nicolaische Verlags-Buchhandlung, Berlin. Li, L., 1986. Xiandai hanyu juxing [Sentence Patterns of Modern Chinese]. Commercial Press, Beijing. Li, Y.-h.A., 1990. Order and Constituency in Mandarin Chinese. Kluwer, Dordrecht. Li, J., 2000. Xinzhu guoyu wenfa [New Chinese Grammar], 3rd ed. Commercial Press, Beijing. Li, C., 2015. On the V-DE construction in Mandarin Chinese. Lingua Sin. 1 (6), 1--40. Li, C., Thompson, S.A., 1974. Co-verbs in Mandarin Chinese: verbs or prepositions? J. Chin. Linguist. 2 (3), 257--278. Li, C., Thompson, S.A., 1981. Mandarin Chinese: A Functional Reference Grammar. University of California Press, Berkeley. Liu, H., 2008. Dependency distance as a metric of language comprehension difficulty. J. Cognit. Sci. 9 (2), 159--191. Liu, Y., Pan, W., Gu, W., 1983. Shiyong xiandai hanyu yufa [A Practical Grammar of Modern Chinese]. Commercial Press, Beijing. Liu, H., Xu, C., Liang, J., 2017. Dependency distance: a new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21, 171--193. Lu, B., Cai, Z., 2009. Zukuai yu yuyan jiegou nandu [Chunking and structural complexity of linguistic units]. Chin. Teach. World 23 (1), 3--16. Lu, Q., Xu, C., Liu, H., 2016. Can chunking reduce syntactic complexity of natural languages? Complexity 21 (S2), 33--41. Matthews, P., 1981. Syntax. Cambridge University Press, Cambridge, UK. Mel’cuk, I., 1988. Dependency Syntax: Theory and Practice. State University of New York Press, Albany. Mel’cuk, I., Pertsov, N., 1987. Surface Syntax of English: A Formal Model with the Meaning-Text Framework. John Benjamins, Amsterdam. Miller, G.A., 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81--97. Niu, R., 2017. Chinese descriptive and resultative V-de constructions: a dependency-based analysis. In: Proceedings of the Fourth International Conference on Dependency Linguistics, pp. 154--164. O’Grady, W., 1998. The syntax of idioms. Nat. Lang. Linguist. Theory 16, 279--312. Osborne, T., 2005. Beyond the constituent: a DG analysis of chains. Folia Linguist. 39 (3--4), 251--297. Osborne, T., Groß, T., 2016. The do-so-diagnostic: against finite VPs and for flat non-finite VPs. Folia Linguist. 50 (1), 97--135. Osborne, T., Niu, R., 2017. The component unit: introducing a novel unit of syntactic analysis. In: Proceedings of the Fourth International Conference on Dependency Linguistics, pp. 165--175. Osborne, T., Putnam, M., Groß, T., 2012. Catenae: introducing a novel unit of syntactic analysis. Syntax 15 (4), 354--396. Paul, W., 2008. The serial verb constructions in Chinese: a tenacious myth and a Gordian knot. Linguist. Rev. 25 (3/4), 367--411. Paul, W., 2015. New Perspectives on Chinese Syntax. de Gruyter, Berlin. Radford, A., 2004. English Syntax: An Introduction. Cambridge University Press, Cambridge, UK. Robinson, J., 1970. Dependency structures and transformational rules. Language 46, 259--285. Sang, E., Tjong Kim, F., Buchholz, S., 2000. Introduction to the CoNLL-2000 shared task: chunking. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, pp. 127--132. Schubert, K., 1987. Metataxis: Contrastive Dependency Syntax for Machine Translation. Foris Publications, Dordrecht. Shi, C., 1986. Jubenwei yufa lunji [Collections of Grammar Articles from Sentence-based Perspective]. Shanghai Educational Publishing House, Shanghai, pp. 142--158. Starosta, S., 1988. The Case for Lexicase: An Outline of Lexicase Grammatical Theory. Pinter Publishers, London. Tesnière, L., 1959. Éléments de syntaxe structurale. Klincksieck, Paris. Tesnière, L., 2015. Elements of Structural Syntax. Kahane, Sylvain (trans.). John Benjamins, Amsterdam. Tsai, C.-Y.E., 2012. Descriptive complement constructions as concealed pseudoclefts in Chinese. In: Proceedings of the 29th West Coast Conference on Formal Linguistics, pp. 259--267. Wang, L., 1985. Zhongguo xiandai yufa [Modern Grammar of Chinese]. Commercial Press, Beijing. Xing, X., 2004. Xiandai hanyu jianyu shi [The Pivotal Sentence of Modern Chinese]. Beijing Broadcasting Institute Press, Beijing. Yi, Z., 2003. Tai han liandong jiegou bijiao yanjiu [A Comparative Study of Verbal Constructions in Series in Thai and Chinese]. J. Chin. People Liberation Army Univ. Foreign Lang. 26 (3), 38--41. Zhu, D., 1982. Yufa jiangyi [Lectures on Grammar]. Commercial Press, Beijing. Zhu, D., 1985. Yufa dawen [Questions and Answers on Grammar]. Commercial Press, Beijing.

Please cite this article in press as: Niu, R., Osborne, T., Chunks are components: A dependency grammar approach to the syntactic structure of Mandarin. Lingua (2019), https://doi.org/10.1016/j.lingua.2019.03.003