Lingua 116 (2006) 1524–1552 www.elsevier.com/locate/lingua
On syntactic and phonological representations Ad Neeleman *, J. van de Koot 1 Department of Phonetics and Linguistics, UCL, Gower Street, London WC1E 6BT, UK Received 4 February 2005; received in revised form 9 August 2005; accepted 10 August 2005
It is an honour for us to contribute to this special issue. Under Neil’s leadership, the linguistics section of the department has benefited from an open intellectual atmosphere that allows critical discussion and the development of new ideas. We have also greatly valued his humane style of management. Given that Neil has worked in both phonology and syntax, we hope that this paper will amuse him.
Abstract This paper argues that phonological representations are not trees, but strings structured through boundary symbols. Because trees are richer in information than strings, our main argument rests on a demonstration that tree-based phonology is too strong, in that it allows rules for which there is no empirical basis. We discuss three contrasts between syntax and phonology that can be understood if phonology lacks trees. The first is that syntax has recursive structures, whereas phonology does not. The second is that syntax allows nonterminal nodes with feature content (as a result of percolation), but that no convincing case can be made for the feature content of putative nonterminal nodes in phonology. Finally, syntactic dependencies are conditioned by c-command, while phonological dependencies are, by and large, conditioned by adjacency. The second and third difference between syntax and phonology can also be used to demonstrate that a treebased phonology is too weak, in that independently motivated conditions on trees do not allow existing phonological rules. # 2006 Elsevier B.V. All rights reserved. Keywords: Syntax; Domination; Phonology; Boundary symbols; Recursion; Projection; C-Command
1. Introduction The aim of this paper is to show that phonological representations are not trees, but strings structured through boundary symbols. The argument is not an easy one to make, because trees typically encode more information than structured strings. Consequently, there are no * Corresponding author. Tel.: +44 20 7679 3154; fax: +44 20 7679 3262. E-mail addresses:
[email protected] (A. Neeleman),
[email protected] (J. van de Koot). 1 Tel.: +44 20 7679 3165; fax: +44 20 7679 3262. 0024-3841/$ – see front matter # 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.lingua.2005.08.006
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1525
phenomena that could force one to abandon phonological trees in favour of a more impoverished representation. We therefore begin by showing that phonological trees allow too much and that additional stipulations must be made hence in order to achieve descriptive adequacy. These are unnecessary in the alternative approach, because theories based on boundary symbols do not overgenerate in the same way. As we learn more about tree-based theory, we will also be able to show that tree-based phonology allow too little, in that independently motivated conditions on trees are incompatible with the essentially linear nature of many phonological processes. 2. Domains and domination We begin by considering phenomena that necessitate trees in syntax. The syntactic representation usually assigned to an example like Distinguished professors drink whisky groups verb and object together in a unit that excludes the subject. This unit is taken to inherit its categorial features from the verb. Thus, standard syntactic representations express two types of information: constituency and what one might call headedness. A representation that expresses inheritance of features must also express constituency. Consider (1), a GB-style representation. The brackets indicate domains, while the subscripts indicate the category of the various dominating nodes, as determined by the theory of phrase structure. Since a feature must be a feature of something, inheritance presupposes the construction of higher-order units, and these higher-order units automatically give rise to constituency. To put it differently, domination implies domains of domination. (1)
[IP [NP Distinguished professors] [VP drink [NP whisky]]].
The opposite is not true: domain formation does not imply domination. Suppose all we needed to express was that in (1) distinguished and professors belong together, as do drink and whisky. Then we would not need to assume a tree structure; it would be sufficient to insert boundary symbols in the appropriate places. This is illustrated in (2), where word boundaries are indicated by a single line and phrasal boundaries by a double line. (2)
Distinguished j professors jj drink j whisky.
Representations of this type do not contain enough information to do syntax. Consider what happens if the object is made slightly more complex. The domains inherent in the GB-style representation in (3a) are not properly reproduced by (3b), since here drink and old belong as closely together as old and whisky and drink and whisky. They are also not properly reproduced by (3c), since here old whisky and distinguished professors stand in the same relation to drink. One could remedy the situation by introducing a new boundary symbol (the triple lines in (3d)). This is a problematic strategy, however, because any additional degree of complexity in the object would require a new boundary symbol. Since the object can be indefinitely complex, there will be syntactic structures that cannot be captured by a grammar with a finite number of boundary symbols (no matter how large). In other words, the recursive nature of syntax can only be captured with an infinite stock of boundary symbols. (3)
a. b.
[IP [NP Distinguished professors] [VP drink [NP old whisky]]]. Distinguished j professors jj drink j old j whisky.
1526
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
c. d.
Distinguished j professors jj drink jj old j whisky. Distinguished j professors jjj drink jj old j whisky.
A solution to this problem is to generate boundary symbols with rewrite rules, such as those in (4), which decompose stronger boundaries as concatenations of weaker ones: (4)
B!j B ! BB
Indeed, a boundary-based system is sufficient for representing constituency of arbitrary depth, provided that (i) there is no limit on the number of distinct boundary symbols used to structure a string, and (ii) all syntactic rules are sensitive to the relative strength of these boundary symbols rather than to their absolute value. Consider why. A string-structuring algorithm based on (4) makes it impossible to define constituents in terms of particular types of boundary symbols. We cannot say, for example, that a substring is a constituent if it contains no boundary symbols whose strength exceeds jj. By this criterion drink old in (3d) is a constituent. We also cannot say that a substring is a constituent if it contains no boundary symbols whose strength exceeds j, because drink old whisky would not count as a constituent. Similar problems obtain no matter what cut-off point is chosen. The only definition that works classifies a substring as a constituent if its outer boundary symbols are stronger than its inner boundary symbols. The assumptions that allow us to encode constituency in a string correspond to a system that has tree structures but lacks projection. Suppose we assign to each of the nonterminals in the tree erected over (3a) an integer expressing its maximum distance from the terminal yield, as in (5a). (This assignment does not add any information to the tree but simply makes explicit the internal complexity of each nonterminal.) Although we will not demonstrate this here, the resulting representation can be mapped onto the string in (5b) by a simple reversible algorithm. (That such an algorithm exists can be understood as follows. The numbering expressing the level of embedding in (5a) requires a successor function, namely the one that generates the natural numbers. The rewrite rules that generate the boundary symbols in (5b) also define a successor function, namely the one whose extension is {j, jj, jjj, jjjj, . . .}. What the algorithm needs to do, then, is to map the strength of a boundary symbol between two terminals to the level of embedding annotated on the first node that dominates those terminals.)
While representations like (5b) suffice for expressing constituency, they do not and cannot express headedness and its associated properties. This is because the relation between the elements separated by a boundary symbol is symmetrical, while a relation between a head and its complement is asymmetrical. The head is primary in determining the category and other properties of the constituent. In a tree structure, headedness can be represented through
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1527
percolation of features to nonterminals, but the string in (5b) has no categories to which percolation can take place. This is not a minor defect, since the shape of a tree erected over a string of words is largely determined by selectional requirements, which in turn depend on projection for their satisfaction.1 For example, a verb can only select internal arguments if it projects (for more discussion on the necessity of projection in syntax, see section 4). In summary, a string-based theory of syntactic structure is not viable. First of all, if the number of boundary symbols is limited, it is impossible to express recursion beyond a certain depth.2 Moreover, even if this restriction is lifted, it remains impossible to express a key property of syntax, namely projection. This does not imply that other domains of linguistic analysis could not make use of string-based representations. In particular, we argue in this paper that phonological representations are not trees, as often assumed, but strings divided by a limited set of boundary symbols. This will make it possible to derive three fundamental properties of phonology: the lack of recursion (section 3), the lack of projection (section 4) and the lack of anything resembling syntactic dependencies (section 5). 3. Recursion 3.1. Strictly layered trees As opposed to generative syntax, generative phonology started out as a theory based on boundary symbols. Chomsky and Halle (1968) use two symbols, # and +, to characterize prosodic domains. An example like we established telegraphic communication is assigned the following representation: (6)
###we#####establish#PAST####tele+graph#ic###communicate#ion####.
The number and type of phonological units a string contains depends on the number and type of boundary symbols that are used to structure it. For instance, Chomsky and Halle tentatively define a word (for phonological purposes) as ‘a string of formatives (one or more) contained in the context ##___## and containing no occurrences of ##’ (p. 13). These boundaries are not only used to regulate word-internal processes, but also to condition rules that apply across word boundaries (see Selkirk, 1972, 1974). Phonological theory has moved away from representations based on two boundary symbols. Instead, much richer hierarchical structures have been proposed. According to Selkirk (1981 and much subsequent work), an utterance (U) contains one or more intonational phrases (Is), which each contain one or more prosodic phrases (Fs). These in turn contain prosodic words (vs), feet (Fs) and syllables (ss), in that order. In other words, there is a hierarchy of prosodic categories of decreasing size: (7)
1
Prosodic hierarchy U>I>F>v>F>s
There are theories without projection in the formal sense. Collins (2002), for example, argues that syntactic nodes have no labels. His theory must be accompanied, however, by locality principles – such as his Locus Principle – that regulate access to information in terminal nodes. The access regulated by such principles emulates the effects of projection. 2 Here and below, ‘string-based theory’ is to be read as ‘a theory that segments strings by means of unpaired boundary symbols’. A tree can of course be expressed as a string with labeled brackets (paired boundary symbols that carry features). However, our use ‘string-based theory’ is intended to exclude representations of this richer type.
1528
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
This shift from string-based to hierarchical representations partly took place because the two boundary symbols of Chomsky and Halle (1968) do not provide an adequate typology of the domains to which phonological rules are sensitive. The enriched inventory of categories allows phonological rules to have a wider range of structural descriptions. Auslautverha¨rtung, for example, is sensitive to syllable boundaries, while Nespor and Vogel (1982, 1986) argue that radoppiamento sintattico is sensitive to F-boundaries. But an equally important factor contributing to the general acceptance of Selkirk’s theory was a general trend in phonology towards autosegmental representations (this is alluded to in Selkirk, 1984:7–8). The # and + diacritics were felt to have no place in this trend away from linear representations (see Scheer, in press-a,b for a critical review of the motivation for introducing trees in prosodic phonology). Thus, the idea of nested domains is usually captured by representing the phonology of a sentence as a tree:
One condition on prosodic trees is that any prosodic category must be properly contained in a single category of the superordinate level. A prosodic word cannot belong to more than one prosodic phrase, a prosodic phrase cannot belong to more than one intonational phrase, etc. We may call the principle excluding such structures Proper Containment; the formulation in (9) is adapted from Nespor and Vogel (1986:7). (Nespor and Vogel give (9) as a second clause of the Strict Layer Hypothesis in (11) below, but for reasons of presentation we will keep the two conditions separate.) (9)
Proper Containment A boundary at a particular level of the prosodic hierarchy implies all weaker boundaries.
According to (9), an I-boundary implies a F-boundary, a F-boundary implies a v-boundary, and so on. Proper Containment thus excludes trees like (10), in which a syllable belongs to two feet.
A second condition, often referred to as the Strict Layer Hypothesis, requires that categories at a particular level of the prosodic hierarchy exclusively take categories at the next level down for daughters (see Selkirk, 1984; Nespor and Vogel, 1986):
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
(11)
1529
Strict Layer Hypothesis A given nonterminal unit of the prosodic hierarchy, XP, is composed of one or more units of the immediately lower category, XP-1.
The Strict Layer Hypothesis rules out structures like (12), where a F dominates an intonational phrase. That is to say, it rules out prosodic trees that display any degree of recursion, whether bounded, as in (12), or unbounded.3 It is important to note that representations like (12) do not violate Proper Containment and that representations like (10) do not violate the Strict Layer Hypothesis. Hence, both conditions are necessary. (12)
*[F . . . [I . . .[F . . .] . . .] . . . ]
Although some authors have argued for phonological structures exhibiting bounded recursion, such proposals are invariably restricted to a limited set of well-defined structures. We assume that the data in question will eventually yield to alternative treatments. For some cases, this must remain a promissory note.4 For others, concrete proposals that avoid recursion are either available in the literature or can be developed relatively easily (we present an alternative for certain types of alleged bounded recursion below). 3.2. Deriving the Strict Layer Hypothesis The empirical consequences of the theory just sketched may well be correct, but there is considerable conceptual tension between the Strict Layer Hypothesis and the claim that prosodic representations are trees. By their very nature, rule systems that generate trees allow recursion. But there is no recursion in phonology. This problem of overgeneration requires the introduction of an ad hoc constraint, the Strict Layer Hypothesis, that removes a natural property of tree-based representations. (The gist of this argument is already present in the foreword to Scheer, 2004, where it is developed in slightly different terms.) We can clarify the issue by considering the format of the rules that generate trees. These are rewrite rules manipulating a limited set of symbols. In the absence of special conditions,
3
Recursion is bounded if it cannot exceed some arbitrary depth of embedding. It is unbounded otherwise. Ladd (1986, 1996), in particular, argues for some degree of recursion in phonology, providing evidence from (amongst others) parentheticals, coordinate structures and relative clauses. The evidence from parentheticals is based on the observation that the intonation of a clause is unaffected by the insertion of a parenthetical. According to Ladd, this suggests a structure in which a phonological constituent (the parenthetical) is embbedded in a larger phonological constituent of the same type (the host sentence). However, Haegeman (1988) and Espinal (1991) have argued that parentheticals are not integrated syntactically into the host. If so, Ladd’s observation can be understood if linear integration of the parenthetical follows assignment of intonational contours. The evidence from coordinate structures is based on the observation that in structures containing three or more conjuncts intonation depends on the syntactic grouping of these conjuncts. If so, the relevant phenomenon can also be captured by a theory that allows the rules that assign intonation to refer to semantics (see Jackendoff, 1997; Szendro˝i, 2003, and Reinhart, 2006). This might not be unreasonable, given that the intonation in question is linked to which conjuncts are contrasted with which other conjuncts. Finally, the evidence from relative clauses is based on the fact that not all relative clauses are preceded by an intonational break of the same strength. Tokizaki (2001) argues that this need not be explained in terms of recursion, but could be a result of restructuring forced by Ghini’s (1993) Principle of Increasing Unit (more traditionally known as das Gesetz der Wachsenden Glieder). 4
1530
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
recursivity follows from the fact that the same symbols can appear in the input of one rule and the output of another. With this in mind, consider the rewrite rules that generate prosodic trees: (13)
U ! I* I ! F* F ! v* v ! s*
Given that every symbol, except U, can appear as both the input and the output of a rewrite rule, it is a coincidence that no rewrite rule introduces a symbol in its output that is mentioned in the input of a previous rewrite rule. Clearly, there is nothing in the format of rewrite rules that can explain this.5 The alternative is to return to string-based representations structured through boundary symbols. The nonrecursive nature of phonological representations then follows straightforwardly, as long as the number of boundary symbols is finite. (As explained earlier, a finite stock of boundary symbols is insufficient to express embedding of arbitrary depth.) There are a number of ways in which a string-based theory can be set up. We refer the reader to Idsardi (1992) (and subsequent work), Tokizaki (1999, 2001), Hall (2000), and Reiss (2003) for examples of theories seeking to reduce the expressive power of phonology by appealing to string-based representations. Scheer (2004, in press-a-b) proposes a particularly ambitious version of such a theory, which does not only do away with trees but also with boundary symbols, partially on the basis of concerns similar to ours. For expository reasons, we consider a less ambitious theory that treats s, v, F, etc., as boundary symbols, rather than as category labels. So, F does not stand for a prosodic phrase, but rather for a prosodic-phrase boundary. This allows us to replace the standard prosodic structure in (14a) by the one in (14b). (Representations of this type go back a long way; see McCawley, 1968 for an early example.) (14)
a.
[U [I [F [v John’s] [v father]] [F [v suggested] [v a two-seater]]] [I [F [v but] [v John’s] [v mother]] [F [v preferred] [v a fur] [v coat]]]].
b.
John’s v father F suggested v a two-seater I but v John’s v mother F preferred v a fur v coat.
One could object that our proposal reintroduces the type of diacritics used in Chomsky and Halle (1968) to structure strings. It would be better, of course, to develop a phonological 5 It is possible to restrict the recursivity of trees in various ways. However, such restrictions, whatever form they take, are not an inherent property of the rule system that generates trees. An anonymous reviewer suggests that recursivity is not so much a property of trees but of sets of rewrite rules. There are various grammars that specify languages without using rewrite rules, namely by storing a set of tree fragments that can be combined through a substitution operation. Examples include Tree-Adjoining Grammars (Joshi, 1985 and subsequent work) and so-called DOP grammars (Bod, 1998). The reviewer suggests that such systems do not have recursion as an inherent property. This is incorrect. The situation is exactly analogous to systems of rewrite rules: recursion will result if there is a set of primitive trees that can be combined into a structure in which the root node is repeated in the yield. Of course it is possible to place some arbitrary restriction on the set of primitive trees that avoids recursion, but such a restriction is not an inherent property of tree-adjoining or DOP grammars. Hence, an appeal to grammars of this type in seeking to explain the lack of recursion in phonology is unsatisfactory.
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1531
theory that does away with diacritics. But in the present context, this criticism would be unfair, partly because the use of diacritics is not what this paper is about, but mainly because the labels assigned to prosodic constituents in the standard theory are as much diacritics as the boundary symbols in our alternative. They are diacritics that decorate trees rather than strings, but diacritics nonetheless. No information is lost in the transition from hierarchical to linear structure in (14). This is because the interpretation of (14b) is guided by the condition of Proper Containment, as formulated in (9). Recall that in the tree-based theory this principle rules out representations like (10), in which a syllable belongs to two feet. It has the same effects in a string-based theory. In a representation like . . . s 111 F 222 s . . . the material between the syllable boundaries cannot be construed as a single syllable since it contains a F-boundary and therefore, by implication, a s-boundary. Similarly, it follows from (9) that any material not separated by a v-boundary or a boundary of a superordinate domain, belongs to the same prosodic word, while any material not separated by a F-boundary or a boundary of a domain superordinate to F, belongs to the same prosodic phrase, and so on. Consequently, (14a) and (14b) are equivalent in that they define the same sets of prosodic domains, namely the ones in (15). (15)
a.
Prosodic words:
{John’s; father; suggested; a two-seater; but; John’s; mother; preferred; a fur; coat}
b.
Prosodic phrases:
{John’s father; suggested a two-seater; but John’s mother; preferred a fur coat}
c.
Intonational phrases:
{John’s father suggested a two-seater; but John’s mother preferred a fur coat}
d.
Utterance:
{John’s father suggested a two-seater . . . . . . but John’s mother preferred a fur coat}
Although the notations used in (14a) and (14b) are equivalent in this respect, there are certain tree structures that have no counterpart in the string-based notation. Crucially, the string-based theory cannot generate any structure that violates the Strict Layer Hypothesis. As an example, consider the case of a prosodic phrase that contains an intonational phrase, which in turn dominates two prosodic phrases (as in (16a), where 111, 222, etc., are unanalyzed strings). It is simply not possible to construct an equivalent representation using only boundary symbols. The string in (16b) seems to come closest, but in fact this representation contains three intonational phrases, which themselves only contain prosodic phrases, as required. In other words, (16b) is equivalent to (16c), and not to (16a). (16)
a. b. c.
*. . . [F 111 [I [F 222] [F 333]] 444] . . . . . . F 111 I 222 F 333 I 444 F ... [I . . . [F 111]] [I [F 222] [F 333]] [I [F 444] . . .]
Thus, as suggested earlier, the absence of recursion in phonology, bounded or unbounded, follows straightforwardly if phonological representations are strings segmented by boundary
1532
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
symbols. In such a theory, the effects of the Strict Layer Hypothesis follow from Proper Containment, and therefore the former can be abandoned. Ladd (1996:239) points out three further consequences of the Strict Layer Hypothesis, which all follow straightforwardly from the string-based system. The first is that in prosodic representations no levels can be skipped. For example, it is not possible for an intonational phrase to directly dominate prosodic words only, as in (17a). The representation in (17b) is probably the string-based counterpart that comes closest, but here 111 v 222 qualifies both as an intonational and prosodic phrase, as it contains neither an I-boundary, nor a F-boundary. Hence, (17b) corresponds to (17c), rather than (17a). (17)
a. b. c.
*. . . [I [v 111] [v 222]] . . . . . . I 111 v 222 I . . . . . . [I [F [v 111 ] [v 222]]] . . .
The second consequence is that unlabeled nodes are ruled out. For example, if an I-node dominates three prosodic phrases, it is not possible to create an asymmetric structure by grouping two of those under an anonymous node, as in (18a) (where the crucial brackets are boldfaced). This structure, too, has no counterpart in the string-based system: the domains expressed by (18b) are equivalent to those expressed by the ternary branching tree in (18c), and not to those expressed by (18a). (18)
a. b. c.
*. . . [F [v 111 ] [[v 222] [v 333]]] . . . . . . F 111 v 222 v 333 F . . . . . . [F [v 111] [v 222] [v 333]] . . .
Finally, Ladd points out that, as a result of the Strict Layer Hypothesis, no category can have heterogeneous sisters. For example, it is not possible for an I-node to immediately dominate a prosodic phrase and a prosodic word, as in (19a). Again, there simply seems to be no equivalent representation in a string-based system. The string in (19b) corresponds to the structure in (19c) rather than to that in (19a). If a v-boundary is placed between 111 and 222, as in (19d), the corresponding structure is the one in (19e). Both (19c) and (19e) adhere to the Strict Layer Hypothesis. (19)
a. b. c. d. e.
*. . . ... ... ... ...
[I [F 111 ] [v 222]] . . . I 111 F 222 I . . . [I [F [v 111 ]] [F [v 222]]] . . . I 111 v 222 I . . . [I [F [v 111] [v 222]]] . . .
Although ruling out heterogeneous sisters is desirable in certain empirical domains, phonology is rife with cases where the strict layer hypothesis is violated by such things as prosodic words consisting of binary feet plus an unfooted syllable, or feet consisting of two syllables, plus an unsyllabified consonant. The problem posed by these data, whether one is working in a tree-based or a string-based theory, is to accommodate the required structures while not sacrificing the otherwise correct consequences of the Strict Layer Hypothesis.
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1533
However, string-based theories of the type developed here face an additional problem: as Idsardi (1992) points out, they do not seem to allow unparsed syllables or consonants at all. Consider the representation of a word containing three syllables, of which the last is unfooted. This structure cannot be represented as in (20a), because the string 333 would be interpreted as a foot containing a single syllable. In order to remedy this problem, Idsardi proposes that boundary symbols are directional: they mark the right or the left edge of a domain. In this spirit, we could adapt our notation to allow a representation of the unfooted syllable as in (20b), where 333 is not parsed as belonging to the foot preceding it, and does not form its own foot because the Fboundary looks leftward. Although a solution along these lines is a possibility, the use of directional brackets potentially reintroduces into the string-based theory some of the expressive power of trees. (20)
a. b.
v 111 s 222 F 333 v v 111 s 222 F 333 v
We would like to suggest a different solution. A string is an ordered set of elements. For some elements ordering is forced by the fact that they must be pronounced. It is not possible, for example, to leave two syllables unordered with respect to each other, as this does not allow them to be realized. However, the same cannot be said of boundary symbols. Since these do not have phonetic content, in principle they do not have to be ordered with respect to elements that are pronounced. Of course, if all boundary symbols remain unordered with respect to all phonetically realized material, the string is simply not segmented. But a partially unordered string can provide a straightforward representation of unfooted syllables and the like. Consider (21), where the braces indicate lack of ordering between the foot boundary and the third syllable. In this representation the string 111 s 222 is a foot, but the third syllable does not qualify as such. Since it also does not qualify as part of a foot, (21) adequately expresses that it remains unfooted.
This strategy easily generalizes to other types of unparsed material. For example, an unsyllabified word-final consonant could be represented as in (22).6
It seems, then, that leaving certain boundary symbols unordered with respect to a substring solves the problem of ‘heterogeneous daughters’ without unduly threatening those aspects of the Strict Layer Hypothesis that seem to be correct. In sum, the existence of nested domains is captured as successfully in a system based on boundary symbols as in a tree-based theory of phonological representations, even if we take into account unparsed material. However, the tree-based theory needs an additional stipulation, the Strict Layer Hypothesis, to rule out structures that cannot be expressed in the string-based alternative to begin with. 6
We think that this notation can also accommodate prosodic adjunction and similar structures proposed by Selkirk (1996) and Truckenbrodt (1999), but space does not permit us to demonstrate this here.
1534
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
3.3. A string-based theory with an infinite number of boundary symbols Tokizaki (1999, 2001) puts forward a proposal that has some affinity with the string-based theory outlined here. He suggests a mapping rule that derives a phonological representation as in (23b) from the syntactic structure in (23a). This rule replaces brackets by boundary symbols of different strength. A subsequent cyclical procedure of boundary weakening derives different types of prosodic domains. At every cycle boundaries are weakened by one unit of strength. In the case at hand, the first cycle derives prosodic words, the second prosodic phrases, and so on: (23)
a.
[[[On] [Sunday]] [[John] [[reads] [[a] [book]]]]]
b.
jjj On jj Sunday jjjj John jjj reads jjj a jj book jjjjj
c.
jj On j Sunday jjj John jj reads jj a j book jjjj
v (first cycle)
d.
j On Sunday jj John j reads j a book jjj
F (second cycle)
e.
On Sunday j John reads a book jj
I (third cycle)
f.
On Sunday John reads a book j
U (fourth cycle)
Although Tokizaki presents his proposal as string-based, (23b) is not a string in the relevant sense. Consider why. Given the widely assumed ban on crossing branches in syntax, the distinction between left and right brackets is convenient but redundant. The context-free pairing up of the directional symbols in (23a) and of the nondirectional symbols in (23b) yield exactly the same hierarchical structure. This implies that the representation in (23b) corresponds to an unlabeled tree. The subsequent weakening of boundary symbols does create representations that no longer correspond to a tree, as the deletion rule is unable to maintain the integrity of the bracketing. In (23), for example, some cycles involve the deletion of an odd number of brackets. In general, the problem with doing phonology on the basis of unlabeled trees is that there is no way of linking the application of certain types of rules to particular types of domain. Since nothing identifies a constituent as, say, a prosodic phrase, there is nothing that triggers the rules relevant to prosodic phrases. Tokizaki’s proposal does not imply that phonological representations are unlabeled trees, but since the cyclical weakening procedure takes such a tree as input, it partially inherits this problem. The reason is that syntactic terminals can in principle be separated by an indefinite number of brackets, which implies the existence of an infinite number of boundary strengths. This fact can be illustrated with the example in (24). (24)
John said that jj John’s father said that jjj John’s father’s father said that jjjj John’s father’s father’s father said . . .
However, the number of phonologically relevant domains (and hence boundary types) is very limited—most theories assume no more than seven distinct types of domain. The key shortcoming of Tokizaki’s proposal is that it fails to reduce the infinite number of domain types made available by the boundary weakening procedure to the handful of phonologically
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1535
relevant ones. In other words, the proposed phonological representations display the unbounded recursiveness that is inherent in trees but irrelevant to phonology.7 4. Projection 4.1. Percolation in syntax One consequence of the assumption that domination is a primitive of syntax is the possibility of feature percolation. A syntactic feature may be copied from a node to its mother. This is best illustrated with the projection of categorial features: the various constituents in (25) are assigned a label on the basis of the category of their heads. (25)
[IP Bob willI [VP thinkV [PP ofP [DP hisD [distant friendsN]]]]]
As explained in section 2, projection is necessary to make categorial features accessible to selecting heads. In the case at hand, the modal will c-selects a verbal complement, while the determiner his must combine with a nominal category. Such selectional requirements can be resolved locally if the relevant features are copied to the sister of the selecting head8:
Feature percolation is not restricted to projection. It is also used to explain the grammaticality of examples like (27a), where a WH-feature percolates, and (27b), where percolation affects negation. (27)
a. b.
I wonder [
with whom]1 he might be having lunch t1 on Saturday. [ No reasonable person] would purchase anything so daft.
If it is true that phonological representations are not trees, then there should be no such thing as feature percolation in phonology. This claim is uncontroversial for almost all aspects of phonological representations. In many cases it makes no sense for features of vowels and consonants to 7
A different problem is that Tokizaki’s initial representations are isomorphic to syntactic trees. Consider the examples in (ia) and (ib), which are syntactically rather different but which have an identical prosodic phrasing for the sequence John expects Mary. (i) a. [[John] [[expects] [Mary]]] a’. {John} {expects Mary} b. [[John] [[expects] [[Mary] [[to] [win]]]]] b’. {John} {expects Mary} {to win} While the boundary weakening rule will give the right results for (ia), it will fail in the case of (ib), because the number of brackets between expects and Mary is identical to the number of brackets between John and expects. This type of problem resurfaces in other syntactic environments. Notice that even in (23), where the object is complex, the verb incorrectly forms its own prosodic phrase. 8 According to Grimshaw (1991), the selectional effects that can be observed in extended projection result from obligatory feature matching, rather than c-selection. This does not affect the argument, as feature matching is also achieved through percolation.
1536
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
percolate up to nonterminal nodes, as they have no interpretation there. Conversely, features that have an interpretation in nonterminal nodes have usually not percolated from terminals. For instance, syllables can be light, heavy or superheavy. These properties depend on the amount of material a syllable contains; they cannot be said to have their origin in any particular consonant or vowel. It makes no sense to assume a feature [+weight] that is copied from a consonant or a vowel to the syllable. (We briefly return to the issue of weight below.) There are a few phonological phenomena that could potentially be analyzed as involving some sort of flow of information between nodes in a tree. We will consider two of these here, namely harmony processes and stress. We should warn the reader that the proposals we will consider are not part of current phonological theory, and that alternative accounts for the phenomena they deal with are readily available in the literature. Hence, we could be accused of fighting ghosts. It is of course telling that ghosts are all there is to fight against: apparently, no convincing case for feature percolation in phonology can be made. But even so, we would like to ask the reader to bear with us, because the proposals at hand will help us elucidate the ways in which syntactic and phonological representations differ. In particular, once we have considered independently motivated properties of percolation, we will argue that percolation-based analyses of harmony processes and stress violate principles that regulate the flow of information in trees. 4.2. Percolation in phonology: harmony Although features like [+plosive] and [+high] are primarily used to characterize consonants and vowels, there are phenomena that could be interpreted as involving percolation of such features to a higher-level category. Let us consider a hypothetical language in which all vowels in a prosodic word inherit some feature [+F] from the nucleus of the leftmost syllable. We could try to capture the spreading of this feature by assuming that it percolates from the leftmost vowel to the prosodic word, as shown in (28). (In principle, more local forms of assimilation could be modeled in the same way.)
Upward percolation is not enough to model vowel harmony for the feature [+F]. After all, the kind of features we are interested in here have no interpretation on syllables, feet or prosodic words. Hence, we must assume that there is a second process by which a feature can be copied downwards. For (28) this results in the tree in (29).
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1537
This kind of analysis was explored in Halle and Vergnaud (1981). However, most of the current literature has adopted a different account of vowel harmony, namely one in terms of autosegmental phonology (see Williams, 1976; Leben, 1973; Goldsmith, 1976, and much subsequent work). On such a theory, the feature [+F] is placed on a separate tier and the process of spreading is modeled by associating it with vowels that follow its source. This is shown in (30), where the solid line indicates the source of [+F] and the dotted lines signify spreading.
At the very least, the availability of this account shows that vowel harmony does not necessitate recourse to phonological trees, as the analysis in (30) is fully compatible with a string-based theory. The autosegmental account seems better equipped to deal with cases where the source of vowel harmony is not in a peripheral position within the domain of the rule. Under such circumstances, it is not uncommon for spreading to be directional, giving rise to representations like the following:
It is considerably less straightforward to represent such directionality in the corresponding tree-based representation. In the case at hand, [+F] must percolate to the prosodic word level, as it affects the two vowels following the source of the feature. However, if it percolates to the word level, we would expect it to be copied downward to all vowels dominated by v [+F] in (32). The fact that the vowel preceding the source is unaffected shows that the rule regulating vowel harmony must relate the directionality of copying to the terminal in which the copied feature originates. (It cannot refer to the instantiation of the feature on v, as a dominating node does not precede or follow any of the nodes it dominates.) But this means that core assumptions underlying the autosegmental account would have to be superimposed on the tree-based account.
We may conclude that harmony processes do not provide a compelling case for percolation in phonology.9 9 An anonymous reviewer points out that a similar conclusion can be drawn on the basis of other phonological processes. In Arabic, for example, pharyngealization (‘emphasis’) is often taken to be a property of CV sequences rather than syllables. But CV sequences are not necessarily constituents.
1538
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
4.3. Percolation in phonology: stress On at least some theories, stress is encoded through the labels ‘strong’ and ‘weak’ assigned to nodes in a phonological tree (see Liberman, 1975; Liberman and Prince, 1977, and much subsequent work). The main stress of a sentence is dominated by strong nodes only, so it is assigned to fruit in the structure below (where the internal structure of vs is omitted).
Liberman and Prince make it very clear that metrical theory should not be interpreted as involving percolation. Rather, what they consider essential about metrical representations is that the assignment of relative strength to sisters preserves the prominence relations internally to these constituents. However, if one wanted to make a case for percolation in phonology, one could try to argue (against the spirit of the theory) that representations like (33) involve upward copying of s-labels. This possibility exists because a nonterminal strong node must dominate precisely one other strong node in much the same way that a nonterminal node in syntax has precisely one head from which it inherits its categorial features. On such a view, the assignment of w-labels would be the result of an elsewhere convention. In fact, such labels could be abandoned altogether without loss of information. Quite apart from the procedure through which nodes are labeled s or w, Liberman and Prince’s theory provides a challenge to our proposal: it appears to claim that the correct representation of stress requires the assignment of properties to nonterminal nodes. This is impossible in a stringbased account, as there are no nonterminal nodes. It is possible to dismiss the challenge that metrical theory poses as one that concerns notation rather than anything substantive. After all, various notations for stress can be thought of that do not rely on trees. For the sake of argument, let us assume that stress assignment above the v-level is governed by the two conditions in (34). (34)
a. b.
There is exactly one peak per phonological domain. Peaks are rightmost in F and I.
In a string-based theory, the example in (33) would be reanalyzed as in (35a). The two representations define exactly the same domains, listed in (35b–d). (35)
a. b. c. d.
Most v zombies F don’t v eat v fruit. Prosodic words: {most; zombies; don’t; eat; fruit} Prosodic phrases: {most zombies; don’t eat fruit} Intonational phrases: {most zombies don’t eat fruit}
The rules in (34) applied to the domains in (35) give the same results, whether one believes in trees or not. These can be expressed by a labeled tree, but it is also possible to place a grid that encodes stress above a string separated by various boundary symbols (see Liberman, 1975):
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
(36)
1539
x x x x x x x x Most v zombies F don’t v eat v fruit
Of course, using grids for stress is the most widely accepted notation in current phonological theorizing. Instead of using grids, we could enrich the information provided by boundary symbols. In order to do so, we should first define the notion of adjacent domain. Each boundary symbol of a particular strength can be preceded and/or followed by an adjacent domain that extends to the next boundary symbol of equal or greater strength. This definition allows us to reinterpret metrical theory as a theory of asymmetric boundary symbols: a subscript ‘>’ on a boundary symbol indicates that the adjacent domain to the right of the boundary symbol is more prominent than the one on the left, while a subscript ‘<’ on a boundary symbol indicates the reverse. Given this convention, the stress pattern required by the rules in (34) could be encoded as in (37a). (For comparison, (37b) is the representation of Most zo`mbies don’t lı´ke it on this notation). (37)
a. b.
Most v> zombies F> don’t v eat v> fruit. Most v> zombies F> don’t v> like v< it.
The intuition behind the s-w notation, namely that stress is relational, is perhaps preserved more transparently by this alternative than by the grid in (36). The existence of adequate notations that do not rely on labelled trees implies that stress cannot be used as an argument for trees in phonology.10 It does not provide evidence for substantive properties of nonterminal nodes, and hence there is no real case for the string-based theory to answer. 4.4. Formal properties of projection Although adopting alternative analyses is a legitimate response to the potential challenges presented by harmony processes and stress, it is not completely satisfactory. It would be more convincing if it could be shown that the copying operations underlying a tree-based account of harmony processes, and the rules responsible for the distribution of s-labels in metrical theory are in conflict with general conditions on the flow of information in trees. We develop such an argument in the remainder of this section, based on two conditions on feature percolation in syntax. The first condition is Inclusiveness, as proposed by Chomsky (1995a:228). This constraint rules out ‘magical features’, features that are part of a syntactic representation R but do not have their origin in any of the lexical items contained in R. Chomsky’s discussion makes clear that Inclusiveness is intended to be uniform. It does not only hold of the root node of a sentence, but of any subtree contained in it. This is highlighted by the following formulation:
10
Although our proposal does not recognize domination in phonology, it does recognize the need for minimal constituency, as expressed through boundary symbols. This means, for example, that the evidence for constituency based on stress alternation after vowel deletion (summarized in Halle and Vergnaud, 1987) is compatible with a stringbased theory.
1540
(38)
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
Inclusiveness The properties of a non-terminal node are fully recoverable from the nodes it dominates; the properties of a terminal node are fully recoverable from the lexicon.
A uniform version of Inclusiveness does more than rule out the ad hoc introduction of features. It also excludes downward projection, and sideways transfer of features.11 For example, the terminal a1 in the tree in (39) cannot project its categorial features to a node that c-commands it. This is because a2’s properties, to the extent that they are determined by projection, cannot be recovered from g or d, the nodes that a2 dominates.
For the same reason, the projection of b1 in (40) cannot be extended from b3 to b4.
The formulation of Inclusiveness in (38) is adapted from Neeleman and Van de Koot (2002). There we propose a second condition that regulates feature percolation, according to which access to information in a node is restricted to nodes that are directly connected to it: (41)
Accessibility Information in a node can be accessed from nodes that immediately dominate it or by which it is immediately dominated.
This condition imposes a strict locality on feature percolation: a feature can only be copied to a node that immediately dominates it. Thus, it is impossible for projections to be discontinuous. The terminal a1 cannot project its categorial features to a2 in (42), because the two nodes are separated by a projection of b (note that (42) does not violate Inclusiveness as defined earlier).
11
An anonymous reviewer objects that sideward and downward transfer of features must be allowed if we are to account for feature sharing between a subject and a verb. In particular, he or she suggests that w-features are copied from the subject to the verb. Of course, this analysis has long been superseded by one that establishes a grammatical relation between the verb’s agreement and the subject.
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1541
The consequences of Inclusiveness and Accessibility in the realm of projection of categorial and other features are entirely uncontroversial. Indeed, it is fair to say that these conditions have been implicitly assumed in phrase structure theory from a very early date (initially with the S- and S’-categories as exceptions). Although Inclusiveness and Accessibility started out as syntactic principles, one would expect that their scope extends to any tree-based grammatical system. The idea behind trees is that, for each domain in a sentence, a set of features is constructed that represents it in subsequent computation.12 This set of features makes up the dominating category. Inclusiveness and Accessibility are inherent in this concept. Accessibility states that once a representative set of features is constructed for a domain, the computation can no longer directly access elements within that domain. Inclusiveness states that a representative set of features can only contain features that were part of the represented material to begin with. 4.5. Stress and vowel harmony do not involve percolation We now address the question whether tree-based accounts of harmony processes and stress adhere to Inclusiveness and Accessibility. If they do not, we may take this to undermine the claim that phonological representations are trees. (Or if they are trees, they are not trees as we know them, Jim.) Let us first consider the putative s-label percolation in metrical theory. At first sight, this process seems to adhere to Inclusiveness and Accessibility fairly well. Some leeway is required because phonological terminals are not labeled s or w inherently. But beyond the syllable level, the distribution of s-labels seems well behaved. Percolation follows a path in which no nodes may be skipped and it is systematically upward, rather than sideways or downward, in that every strong node must contain a strong node. However, when we consider the conditions on stress assignment in more detail, it becomes clear that the stress system as a whole violates both (38) and (41). In other words, not only was s-w labelling not intended to involve percolation, but if one tries to push an interpretation of the system along these lines, failure is unavoidable. Consider the representation in (43), where s- and w-labels have been assigned to prosodic words, but not yet to higher-level categories. In order to determine whether the s-label of v4 can be copied onto F2, we need to know whether v2 has projected its s-label to F1. The reason for this is that an intonational phrase, like any other prosodic category, can have no more than one strong daughter. But this already shows that conditions on s-label percolation are more global than allowed for by Accessibility. In determining the properties of F2, properties of F1 must be taken into account, even though neither node immediately dominates the other.
12
This is particularly clear if we think of trees as generated by rewrite rules, as the lefthand part of a rewrite rule indeed defines a representative of a string for further computation. See the discussion in the fourth chapter of Chomsky (1957).
1542
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
There is nothing comparable in syntax. To see whether a node can project, we only need to take into account the properties of its mother. For example, whether the categorial features of D2 in (44) can be copied to the unlabeled node 3 depends on whether the categorial features of V have been copied (assuming that a node cannot have more than one set of categorial features). But whether D1 projects to node 2 need not be taken into account. In other words, decisions about projection are taken on the basis of locally available information; in the case at hand, the properties of unlabeled nodes other than node 3 are simply irrelevant.
The argument is presented in derivational terms, but can be easily reformulated in terms of representations. The point is that phonology has conditions on the distribution of s-labels that are sensitive to properties of sister nodes (assuming prosodic trees), but syntax does not have similar conditions on the distribution of categorial features. Hence, syntax adheres to the condition in (41), but phonology does not. This conclusion is strengthened by alternations in stress placement due to context. A wellknown example is the stress pattern of adjectives like achroma´tic, which changes if they are used as prenominal modifiers, as in a`chromatic le´ns. The respective stress patterns are represented by the following labeled trees:
There is some disagreement about the trigger for the stress alternation. The traditional account relies on a rhythmic component in the rules of stress assignment (see Liberman, 1975;
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1543
Liberman and Prince, 1977 for an early discussion of the alternation in (45), and Hayes, 1984 for a more general discussion of ‘eurhythmy’). The basic idea is to avoid a stress clash by increasing the distance between the main stress of the phrase and the stress in the adjective. This is achieved by reversing the stress pattern in the adjective at the level of the foot. Selkirk (1995) argues that the rhythmic component can be dispensed with if we adopt a principle of Phrase-Edge Prominence, according to which stress preferably resides at the edges of phonological constituents. Although Selkirk’s proposal is grid-based, we reinterpret it here in terms of labeled trees for ease of exposition. In a structure like (45a), which consists of just two feet, Phrase-Edge Prominence does not favour either a w-s or a s-w labeling, as stress will always be located at an edge. Therefore, the stress pattern is decided by the Nuclear Stress Rule, which favours rightmost stress. If a constituent is added, as in (45b), Phrase-Edge Prominence can be satisfied by labeling the adjective s-w, and the phrase as a whole w-s. The result is that primary stress resides at the right edge of the prosodic phrase, while secondary stress is located at the left edge. Both accounts violate Accessibility as well as Inclusiveness. This is because the pattern of projection of the s-label internally to the adjective is not determined by the material contained in the adjective. Rather, it depends on the structure in which the adjective is embedded. The rule responsible for this must therefore consider much larger chunks of structure than Accessibility would allow. Furthermore, a top-down or sideways determination of stress is in conflict with Inclusiveness. Violations of Inclusiveness in a tree-based theory of stress are not limited to rhythm-related phenomena. For instance, quantity-sensitive languages associate weight and stress: heavy syllables tend to attract stress, and stressed syllables tend to be heavy (see McCarthy and Prince, 1986, and references mentioned there). As already mentioned, weight depends on the amount of material a syllable contains. Syllables whose rhyme is V (or occasionally VC) group together as light, while syllables whose rhyme is VV or VC are heavy. A syllable is superheavy if it has a VVC or VCC rhyme (in languages that acknowledge this category). Weight cannot be analyzed as a feature copied to a s-node from one of its daughters. In a heavy syllable there need not be a particular consonant or vowel that is [+heavy]; weight emerges at the syllable level as a result of the sum of elements in the rhyme. This is exactly the sort of situation that Inclusiveness is meant to exclude. Chomsky (1995b) argues for an interpretation of Inclusiveness that rules out syntactic bar-levels, because a bar-level is a global property, like weight, and not a lexical feature copied upward from a terminal. Thus, syntactic structures are ‘bare’ in a way that cannot be true of phonological trees. (We continue the discussion of weight in section 4.6.) We now turn to the tree-based account of vowel harmony introduced in section 4.2. The problems that this account faces should by now be straightforward. As explained, spreading of a feature requires upward as well as downward copying. But downward copying necessarily violates Inclusiveness. Recall that this illegal process cannot be avoided because the relevant feature must be interpreted in the vowels to which it spreads (see the tree in (29)). It is also necessary in order to capture the directionality of certain harmony processes: copying to a dominating node cannot have directionality effects. This might seem a fairly limited conclusion, but on closer inspection it is devastating for treebased approaches to phonology. Vowel harmony is just one instance of the general fact that the specification of phonological segments is context-sensitive. Consonants and vowels frequently receive a different interpretation depending on the environment in which they find themselves. For example, [i] and [j] are assumed to be different realizations of the same element in
1544
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
vocalic and consonantal positions, respectively. Auslautverha¨rtung involves devoicing of consonants in syllable-final position. English [l] is voiceless in voiceless clusters such as [kl], but voiced in voiced clusters such as [gl]. One way of dealing with this alternation is to assume that [l] has no specification for [voice], but inherits one from adjacent material. (When [l] does not occur in a consonant cluster, its voiced character is the result of an elsewhere rule ([+lateral] ! [+voiced])). These kinds of phenomena are to a large extent what phonology is about. Crucially, none of them allows an inclusive tree-based analysis. This is because the specification of the segments in question either depends on their structural position or on adjacent material. This means that in a tree-based phonology they would require an analysis that acknowledges either top-down or lateral influences. But these are exactly the kind of influences that Inclusiveness is meant to rule out. In sum, tree-based theories of stress, vowel harmony and other phonological processes necessarily violate two central conditions that regulate the flow of information in syntactic trees. This suggests to us that phonological representations are not trees. Admittedly, this conclusion rests on an interpretation of Inclusiveness and Accessibility as general conditions on tree-based grammatical systems, rather than as specifically syntactic principles. But consider the alternative. One would have to argue that phonological ‘phrase structure theory’ is stronger than its syntactic counterpart, in that it allows access to material in a way ruled out in syntax. At the same time one would have to argue that it is weaker, because it does not allow recursion. This would be a curious position to defend. 4.6. Syllable structure It has traditionally been acknowledged that in a syllable the nucleus and the coda have a tighter bond than the onset and the nucleus. Hence, an additional constituent called the rhyme is assumed, as in (46a). Levin (1985) reconceptualizes the structure of syllables in terms of a phonological variant of X-bar theory, according to which the coda is the complement of the nucleus, while the onset is its specifier, as in (46b). Thus, the nucleus is assumed to project (see Van Oostendorp, 2000 for a recent incarnation of this proposal).
The asymmetry between onset and coda seems to make it impossible to extend the Strict Layer Hypothesis to the syllable, as the s-node in (46a) has heterogeneous sisters (compare the discussion surrounding (19)). Indeed, it is standardly assumed that units below the syllable-level are not subject to this condition. On the other hand, the extension of X-bar to the realm of phonology can only be very limited. Given that feet, prosodic words, etc., cannot directly dominate consonants or vowels, one could only conceive of foot structure as organized along X-bar theoretic lines if the head of a foot were a syllable (an N’’), rather than a terminal category. A head-initial tri-syllabic foot (if such a thing exists to begin with)
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1545
would have to be represented as in (47) (for a proposal in this spirit see Rennison and Neubarth, 2003).
It will be clear that the parallel with X-bar structures breaks down here: in syntax, a maximal projection can never occupy the head position of another maximal projection.13 It might be that empirical considerations force us to accept a fundamental difference between phonology above and below the syllable, but this would be very unattractive from a conceptual point of view. Why should syllables have a hierarchical organization, while feet and prosodic words are strings? Moreover, as explained in the previous section, processes that take place within the syllable are typically non-inclusive, in that properties of segments are determined by their position. We would therefore favour a string-based reinterpretation of syllabic structure.14 We admit that syllables are asymmetric in a way that cannot be accommodated by just strings and boundary symbols. Of course, a representation like (48) expresses correctly that the bond between the nucleus and the coda is tighter than the one between the nucleus and the onset (a new boundary symbol ‘j’ is used to structure syllables). It thus captures part of what is expressed by the trees in (46). However, (48) fails to encode any asymmetry between rhyme and onset. The reason for this is that boundary symbols separate constituents of the same type. But, as is well-known, onsets and rhymes do not behave in the same way. For instance, the well-formedness constraints that apply to onsets are different from those that apply to rhymes.15 (48)
13
. . . s C j VC s C j V s . . .
An anonymous reviewer remarks that, since the stressed syllable is the head of a foot, the first N’’ in (47) could project to N’’’, rather than be dominated by F, and so on. Although this is a possibility, it has the undesirable consequence that notions like ‘foot’ and ‘prosodic word’ can no longer be associated with a unique label. Nc (where c is a constant) could correspond to a foot in some cases and to a prosodic word in others. Given that prosodic rules typically come with a specific domain of application, this seems unworkable in practice. This is exactly the reason why Rennison and Neubarth (2003) propose trees like (47). 14 We are of course not the only ones to argue for a flat syllable. A first step was taken by Harris (1994:152–153), who suggested that the relation between onset and rhyme is linear rather than structural. A more radically linear theory of syllable structure is developed in Scheer (2004). Scheer’s view of the syllable is very different from the traditional representations in (46). We will not attempt to evaluate it here. Our main aim in this section is to show that even on traditional assumptions a reasonable string-based representation of syllables is available. 15 Recently, Yip (2003) has cast doubt on the onset-rhyme distinction, arguing that the phonotactic facts succumb to explanations based on linear order. If Yip is correct, flattening the syllable is a trivial matter. In what follows we try to reconcile a string-based phonology with the more standard view that the grammatical structure of syllables is asymmetric.
1546
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
One solution to this problem is to adopt an extra tier with which either onsets or rhymes are exclusively associated. An obvious candidate for this tier is presented by the moraic theory of syllable structure. A mora is a unit of weight that plays a role in metrical processes: the number of moras determines the weight of a syllable (see Hyman, 1985). Since onsets do not contribute to syllable weight, it seems reasonable to suggest that there is a moraic tier to which only rhymes are linked (this proposal echoes earlier work by Clements and Keyser, 1983)16:
In fact, it can be argued that something like (49) is the most reasonable way of representing moras, whether one believes in syllabic trees or not. The main alternative in tree-based theories would be to treat moras as categories on a par with syllables and feet. There are two ways of doing so. On one view, syllables directly dominate two kinds of categories, namely the consonants that make up onsets and the moras that make up the rhyme, as in (50a) (see Hayes, 1989). On the alternative view, onsets are linked to the moraic layer of the tree, as in (50b) (see Hyman, 1985).
Both trees in (50) are highly suspicious, because V is immediately dominated by two nonterminal categories. Recall that the main idea behind trees is that, for each domain in a sentence, a dominating node is constructed that represents it in subsequent computation. (It is this idea that underlies Inclusiveness and Accessibility.) From this perspective it is totally unexpected that a single element should have two representatives. Of course, one could allow multiple domination in phonology, but this would amount to admitting that syntactic and phonological trees differ fundamentally. Any theory that claims that phonological and syntactic trees are the same kinds of object would be better off representing moras on a separate tier, outside the tree. We may draw the following conclusions. A string-based theory can accommodate the syllable as a primitive by assuming a syllable boundary symbol17; it can express the internal asymmetry between onsets and rhymes by making use of an additional boundary symbol plus a moraic tier. Such a tier seems necessary even in tree-based theories. Syllable structure therefore does not present a convincing case for trees (or projection) in phonology. 16 The discussion here is limited to the vast majority of languages in which onsets do not contribute to weight. It has been argued, however, that they do in Piraha˜ (see Everett and Everett, 1984; Everett, 1988). If it can be shown that onsets in Piraha˜ are moraic, then the problem of asymmetry in syllables does not arise in this language (at least not for syllable weight). This is exactly what is claimed in Topintzi (2004). (But see Goedemans, 1998 for arguments against alleged cases of onset weight.) 17 Of course, there is no formal reason why a string-based theory should have such a boundary symbol. The inventory of boundary symbols is an empirical matter.
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1547
5. Dependencies 5.1. Syntactic dependencies The assumption that there are no trees in phonology has a third consequence, having to do with the way syntactic and phonological dependencies are conditioned. Syntactic dependencies, such as binding, movement and predication, have several well-known properties. Two of these are summarized by the condition that the antecedent in a syntactic dependency must c-command the dependent: the upward portion of the path from dependent to antecedent is in principle unbounded, while the downward portion is conditioned by immediate domination. Phonological dependencies (such as harmony processes, etc.) differ in that they are not conditioned by anything resembling c-command. If, for the sake of argument, we think of phonological representations as trees, the upward portion of the path from dependent to antecedent seems more local, while the downward portion seems less local, than in syntax. We will argue that this characterization of dependencies in syntax and phonology follows directly from the assumption that phonological representations are strings, rather than trees. In order to make the argument, however, we must first discuss how Inclusiveness and Accessibility affect the encoding of syntactic dependencies. This is actually not a trivial issue. As argued at length in Neeleman and Van de Koot (2002), the standard chain-like encoding of dependencies necessarily violates these conditions. In order to see why, consider the tree in (51), where d is a syntactic dependent (such as an anaphor, a predicate or a trace), and a the antecedent with which it is associated. For explicitness’ sake, let us assume that the fact that d must be linked to an antecedent is encoded by a selectional requirement SR. In (51) SR is satisfied by a (as indicated by ‘#’).
The problem is that as a result of the relation established between them, properties of both a and d change. Since neither node dominates the other, these changes cannot be recovered from the internal structure of a and d, or from their lexical entries. Hence, Inclusiveness is violated. Moreover, given that a and d are not in a relation of immediate dominance, Accessibility is violated as well. That syntactic relations, if conceived of as chains, induce changes in antecedent and dependent can be demonstrated in various ways. Suppose that d is a predicate, and SR a u-role. Once SR is satisfied by a, d no longer qualifies as a dependent; it cannot be linked to another antecedent, due to the u-criterion. Very similar considerations hold of other dependencies (such as binding and movement). The problem can be stated in different terms. If a nonterminal category in a tree represents the material it dominates for further computation, it seems reasonable to assume that elements in the representative’s domain cannot be accessed directly by elements external to the domain. But this implies that once a representative is constructed for d, satisfaction of SR in d becomes impossible. This formulation of the problem suggests where we should look for a solution.
1548
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
Unsatisfied selectional requirements in a domain must be part of the representative of that domain. In other words, SR is copied upwards until it is sufficiently close to the antecedent a. Inclusiveness and Accessibility dictate what ‘sufficiently close’ means. Only if SR is copied upward recursively to the node that immediately dominates a can it be satisfied without violation of these principles:
Due to Inclusiveness, the fact that a selectional requirement is satisfied must be recoverable from the material dominated by the node that hosts it. Hence, satisfaction of SR must be ‘downward’. Moreover, given that Accessibility restricts relations between nodes to immediate domination, the element that satisfies SR must be a daughter of SR’s host node, as in (52). So, Inclusiveness and Accessibility force a decomposition of grammatical dependencies into two primitive operations: the upward copying and downward satisfaction of a selectional requirement. Two key properties of grammatical dependencies follow from this proposal. The first is that syntactic dependencies may span fairly large distances. This is because the copy operation can be applied recursively: a copied selectional requirement can itself be copied. As a result, the path along which a selectional requirement travels up the tree can in principle be indefinitely long (as long as independent locality conditions are satisfied). On the other hand, the downward relation between the antecedent and the node in which the selectional requirement is satisfied must be extremely local, due to Accessibility. Thus, the structure in (53) is ungrammatical because the node that contains SR# does not immediately dominate a.
In summary, Inclusiveness and Accessibility do not only capture various conditions on projection, but they also explain the requirement that a dependent must be c-commanded by its antecedent. This requirement follows from the unbounded nature of the upward path (copying) of a selectional requirement and the strict locality of its downward path (satisfaction). 5.2. Phonological dependencies With this background, let us return to a comparison of syntactic and phonological processes. If phonological representations were trees, the resulting typology of phonological dependencies would include potentially non-local relations between constituents at different levels of the prosodic hierarchy, as long as c-command obtains. For example, it should be possible to find phonological processes that affect the highlighted F and V in a representation like (54). At the
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1549
same time, a relation between adjacent segments that are not in a configuration of c-command, such as the highlighted V and C, should be impossible.
Anyone familiar with elementary phonological theory will realize that this is exactly the opposite of what is attested. Radoppiamento sintattico, for instance, is a rule that lengthens a word-initial consonant if preceded by a word-final (stressed) vowel. Thus, the highlighted VC sequence in (54) is an environment that allows application of phonological rules. On the other hand, there are no phonological rules that have a description like ‘‘V must have property X if c-commanded by a F with property Y’’, where X could be [+short] and Y [+complex].18 We may safely conclude, then, that phonological processes are not conditioned in the way one would expect if phonological representations were trees.19,20 If phonology lacks hierarchical structure, then what kind of conditions should hold of phonological rules? First, as we have already mentioned, such rules are sensitive to boundary symbols of different kinds. Second, phonological rules typically relate (near) adjacent elements. This is not unexpected. We have proposed that relations between nodes in a tree are conditioned by direct domination, which amounts to structural adjacency (Accessibility). A phonological variant of this requirement would therefore demand linear adjacency. The idea that phonological rules operate under adjacency is a familiar one (see Emonds, 1985; Gafos, 1999 for discussion). To give a typical example, a reduplicative morpheme is always adjacent to the stem from which its surface specification is copied. In fact, the premise of linear adjacency underlies part of the research program in generative phonology. For processes that take place across intervening material (such as vowel harmony), it is argued that the relevant features are part of a separate tier in which adjacency obtains. 18
The fact that we must struggle to find a reasonable property of Fs underlines the absence of projection in phonology. An anonymous reviewer claims that c-command is relevant to phonology, giving the following example. In Bulgarian, unstressed syllables have [low] vowels. Since stressed syllables are picked out by metrical principles, unstressed syllables are just the other syllables in a foot. So, in Bulgarian, a vowel that is c-commanded by a stressed syllable must be [low]. Although this last sentence gives a possible description of the data, it is clear that alternative formulations of the relevant generalization are readily available. For example, one might state that [+low] vowels must be part of a stressed syllable. If one wants to make a case for c-command in phonology (and thereby for phonological trees), it is not sufficient to show that there are phenomena for which a description can be given in terms of c-command; one must show that there are phenomena for which a description in terms of c-command is the only possibility. After all, given that a tree-based theory is richer than a string-based theory, one must motivate the necessity of the additional expressive power. 20 We would like to stress that the empirical basis for our argument is that phonology has no dependencies conditioned by c-command. Hence, phonological dependencies do not parallel syntactic dependencies, such as binding, movement and control. This is not to say that there are no phonological dependencies, but merely that they are of a different nature. 19
1550
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
6. Concluding remarks In this paper we have argued that phonological representations are not trees, but strings segmented by boundary symbols. Since the information contained in a tree is in many ways richer than the information contained in a string, we have made our case for string-based phonology by demonstrating that tree-based theories of phonology overgenerate. Three differences between syntax and phonology were shown to support an argument along these lines. Recursion, projection and long-distance dependencies are characteristic of syntax, but are absent in phonology. This follows if only syntax has trees. The argument can be strengthened if the string-based theory of phonology makes available concepts that are absent in syntax. In fact, we have already seen one such concept, namely linear adjacency. There is a second way in which phonology is richer than syntax: it has boundary symbols to which rules can refer. Syntax has no labels for boundaries, but only for dominating categories. Indeed, there are a number of processes that indicate that phonology cares about the edges of categories. For example, boundaries may be marked by special tones and constituent edges often exhibit different structures than are found word-internally. The sensitivity of phonology to edges is reflected in current theorizing. Work in Optimality Theory, for example, uses various edge-referring constraints to distinguish word-internal from marginal positions (see Broselow, 2003; Prince and Smolensky, 2004, and references cited there). To the extent that this work is on the right track, it makes clear in which respects phonology is different. Acknowledgments This paper was written by two linguists whose knowledge of phonology is patchy at best. It would not have existed were it not for generous help from sympathetic phonologists. We would specifically like to thank John Harris, Alan Prince, Tobias Scheer, Moira Yip, and two anonymous reviewers for useful comments. The cliche´ that the authors are responsible for all remaining errors and omissions is particularly apt here. References Bod, R., 1998. Beyond Grammar: An Experienced-based Theory of Language. CSLI Publications, Stanford. Broselow, E., 2003. Marginal phonology: phonotactics on the edge. The Linguistic Review 20, 159–193. Chomsky, N., 1957. Syntactic Structures. Mouton, The Hague. Chomsky, N., 1995a. Bare phrase structure. In: Webelhuth, G. (Ed.), Government and Binding Theory and the Minimalist Program. Blackwell, Oxford, pp. 383–439. Chomsky, N., 1995b. The Minimalist Program. MIT Press, Cambridge, MA. Chomsky, N., Halle, M., 1968. The Sound Pattern of English. MIT Press, Cambridge, MA. Clements, G., Keyser, S.J., 1983. CV Phonology: A Generative Theory of the Syllable. MIT Press, Cambridge, MA. Collins, C., 2002. Eliminating labels. In: Epstein, S., Seely, D. (Eds.), Derivation and Explanation in the Minimalist Program. Blackwell, Oxford, pp. 42–64. Emonds, J., 1985. A Unified Theory of Syntactic Categories. Foris, Dordrecht. Espinal, M.T., 1991. The representation of disjunct constituents. Language 67, 726–763. Everett, D., 1988. On metrical constituent structure in Piraha˜ phonology. Natural Language and Linguistic Theory 6, 207–246. Everett, D., Everett, K., 1984. On the relevance of syllable onsets to stress placement. Linguistic Inquiry 15, 705–711. Gafos, A., 1999. The Articulatory Basis of Locality in Phonology. Garland, New York. Ghini, M., 1993. F-formation in Italian: a new proposal. Toronto Working Papers in Linguistics 12, pp. 41–78. Goedemans, R., 1998. Weightless segments: a phonetic and phonological study concerning the metrical irrelevance of syllable onsets. Ph.D. dissertation, University of Leiden.
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
1551
Goldsmith, J., 1976. Autosegmental Phonology. Ph.D. dissertation, MIT, published 1979 by Garland. Grimshaw, J., 1991. Extended Projection. Ms., Brandeis University. Haegeman, L., 1988. Parenthetical adverbials: the radical orphanage approach. In: Chiba, S. (Ed.), Aspects of Modern Linguistics. Kaitakushi, Tokyo, pp. 232–254. Hall, D., 2000. Prosodic representations and lexical stress. In: Jensen, J., Van Herk, G. (Eds.), Proceedings of the 2000 Annual Conference of the CLA, Cahiers Linguistiques d’Ottawa, Ottawa, pp. 49–60. Halle, M., Vergnaud, J.R., 1981. Harmony processes. In: Klein, W., Levelt, W. (Eds.), Crossing the Boundaries in Linguistics. Reidel, Dordrecht, pp. 1–22. Halle, M., Vergnaud, J.R., 1987. An Essay on Stress. MIT Press, Cambridge, MA. Harris, J., 1994. English Sound Structure. Blackwell, Oxford. Hayes, B., 1984. The phonology of rhythm in English. Linguistic Inquiry 13, 227–276. Hayes, B., 1989. Compensatory lengthening in moraic phonology. Linguistic Inquiry 20, 253–306. Hyman, L., 1985. A Theory of Phonological Weight. Foris, Dordrecht. Idsardi, W., 1992. The computation of prosody. Ph.D. dissertation, MIT. Jackendoff, R., 1997. The Architecture of the Language Faculty. MIT Press, Cambridge, MA. Joshi, A.K., 1985. Tree adjoining grammars: how much context-sensitivity is required to provide reasonable structural descriptions? In: Dowty, D., Karttunen, L., Zwicky, A. (Eds.), Natural Language Parsing. Cambridge University Press, Cambridge, pp. 206–250. Ladd, R., 1986. Intonational phrasing: the case of recursive prosodic structure. Phonology Yearbook 3, 311–340. Ladd, R., 1996. Intonational Phonology. CUP, Cambridge. Leben, W., 1973. Suprasegmental phonology. Ph.D. dissertation, MIT. Levin, J., 1985. A metrical theory of syllabicity. Ph.D. dissertation, MIT. Liberman, M., 1975. The intonational system of english. Ph.D. dissertation, MIT. Liberman, M., Prince, A., 1977. On stress and linguistic rhythm. Linguistic Inquiry 8, 249–336. McCawley, J.D., 1968. The Phonological Component of a Grammar of Japanese. Mouton, The Hague. McCarthy, J., Prince, A., 1986. Prosodic Morphology. Ms., Brandeis University. Neeleman, A., Van de Koot, J., 2002. The configurational matrix. Linguistic Inquiry 33, 529–574. Nespor, M., Vogel, I., 1982. Prosodic domains of external sandhi rules. In: Van der Hulst, H., Smith, N. (Eds.), The Structure of Phonological Representations I. Foris, Dordrecht, pp. 225–255. Nespor, M., Vogel, I., 1986. Prosodic Phonology. Foris, Dordrecht. Prince, A., Smolensky, P., 2004. Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell, Oxford. Reinhart, T., 2006. Interface Strategies. MIT Press, Cambridge, MA. Reiss, C., 2003. Stress Computation Using Non-directed Brackets. Ms., Concordia University. Rennison, J., Neubarth, F., 2003. An X-bar theory of government phonology. In: Ploch, S. (Ed.), Living on the Edge. Mouton, Berlin, pp. 95–130. Scheer, T., 2004. A Lateral Theory of Phonology. Vol. 1: What is CVCV, and Why Should It Be. Mouton de Gruyter, Berlin. Scheer, T., in press-a. A Lateral Theory of Phonology. Vol. 2: On Locality, Morphology and Phonology in Phonology. Mouton de Gruyter, Berlin. Scheer, T., in press-b. Why the Prosodic Hierarchy is a diacritic and why the interface must be direct. In: Proceedings of the Tilburg Sounds of Silence conference. Tilburg, October 2005. Selkirk, E., 1972. The phrase phonology of English and French. Ph.D. dissertation, MIT, published 1980 by Garland Press. Selkirk, E., 1974. French liaison and the X-bar notation. Linguistic Inquiry 5, 573–590. Selkirk, Elisabeth, 1981. On prosodic structure and its relation to syntactic structure. In: Thorstein, Fretheim (Ed.), Nordic Prosody II. TAPIR, Trondheim, pp. 111–140. Selkirk, E., 1984. Phonology and Syntax. MIT Press, Cambridge, MA. Selkirk, E., 1995. Sentence prosody: intonation, stress, and phrasing. In: Goldsmith, J. (Ed.), The Handbook of Phonological Theory. Blackwell, Oxford, pp. 550–569. Selkirk, E., 1996. The prosodic structure of function words. In: Morgan, J.L., Demuth, K. (Eds.), Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition. Lawrence Erlbaum, Mahwah, NJ. Szendro˝i, K., 2003. A stress-based approach to the syntax of Hungarian focus. The Linguistic Review 20, 37–78. Tokizaki, I., 1999. Prosodic phrasing and bare phrase structure. In: Proceedings of the North-East Linguistic Society 29, vol. 1. GLSA, Amherst, pp. 381–395. Tokizaki, I., 2001. Prosodic hierarchy and prosodic boundary. Ms., Sapporo University. Topintzi, N., 2004. Moraic onsets and WSP in Piraha˜. Paper presented at the Second Cambridge Postgraduate Conference, England.
1552
A. Neeleman, J. van de Koot / Lingua 116 (2006) 1524–1552
Truckenbrodt, H., 1999. On the relation between syntactic phrases and phonological phrases. Linguistic Inquiry 30, 219–255. Van Oostendorp, M., 2000. Phonological Projection: A Theory of Feature Content and Prosodic Structure. Mouton de Gruyter, Berlin. Williams, E., 1976. Underlying tone in Margi and Igbo. Linguistic Inquiry 7, 463–484. Yip, M., 2003. Casting doubt on the onset/rime distinction. Lingua 113, 779–816.