On the particulars of Universal Grammar: implications for acquisition

On the particulars of Universal Grammar: implications for acquisition

Language Sciences xxx (2014) 1–10 Contents lists available at ScienceDirect Language Sciences journal homepage: www.elsevier.com/locate/langsci On ...

840KB Sizes 0 Downloads 30 Views

Language Sciences xxx (2014) 1–10

Contents lists available at ScienceDirect

Language Sciences journal homepage: www.elsevier.com/locate/langsci

On the particulars of Universal Grammar: implications for acquisition Cedric Boeckx a, b, Evelina Leivada b, * a b

Catalan Institute for Research and Advanced Studies (ICREA), Spain Department of Linguistics, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007 Barcelona, Spain

a r t i c l e i n f o

a b s t r a c t

Article history: Available online xxx

This study addresses the primitives of Universal Grammar, arguing in favor of an impoverished version of it that is not shaped in the form of principles and parameters. First, we review previous accounts on variation that make a case for lexical and/or syntactic parameters and claim that there are empirical arguments for viewing variation as confined to one component of grammar: morphophonology. Second, we discuss the process of language acquisition in the absence of parameters and parametric hierarchies and we show how the acquisition task is viable without assuming parameters. The cues that aid the learner are identified and schematically integrated in the form of an acquisition algorithm. Ó 2014 Elsevier Ltd. All rights reserved.

Keywords: Parameter Acquisition Universal grammar Learning Variation

1. On the poverty of the stimuli Die-hard generativists Robert Berwick and Massimo Piattelli-Palmarini recently edited a volume entitled “Rich languages from poor inputs” (Piattelli-Palmarini and Berwick, 2013). With this title they were clearly alluding to the well-known ‘Poverty of Stimulus’ (POS) that has been at the heart of nativist/empiricist debates since Chomsky discussed it in the 1970s (although the central ideas go back 20 years, with his 1959 review of Skinner’s, 1957 Verbal Behavior). The logic of POS is clear: if the intricacies of grammatical knowledge cannot be derived from properties of the environment, they must come from within. Following Chomsky, Universal Grammar (UG) is the term generative linguists use to designate this internal capacity, whatever it turns out to be. In the literature, however, it is common to find characterizations of UG that are a bit more precise about ‘what’s (expected to be) within’.1 Thus, Chomsky (1980, p. 66) writes, “[t]he argument from the poverty of the stimulus leaves us no reasonable alternative but to suppose that these properties are somehow determined in universal grammar, as part of the genotype”. Or, in a similar vein: “From the point of view I have adopted, universal grammar and the steady state are real. We expect to find them physically represented in the genetic code and the adult brain, respectively” (Chomsky, 1980, pp. 82–83). Chomsky has not always been so committed to a ‘genetic’ characterization of UG. For instance, in Chomsky (1967, p. 9), he limited himself to saying that the basic principles underlying the knowledge of every particular grammar “are determined by the nature of the mind”, which is obviously not identical to saying that properties of UG are directly expressing particular fragments of the human genotype. Likewise, in Chomsky (1975, pp. 91–92), he was talking about biology seeking to determine “the genetic mechanisms that guarantee that the mental organ, language, will have the required character” d which, again, is not the same thing as seeing UG “as part of the genotype”. * Corresponding author. E-mail address: [email protected] (E. Leivada). 1 We are indebted to the work of Víctor Longa and Guillermo Lorenzo, who in a series of illuminating papers (Longa and Lorenzo, 2008, 2012; Lorenzo and Longa, 2009) highlighted the passages quoted in this section (and more). (For related discussion, see also Boeckx and Longa, 2011.) http://dx.doi.org/10.1016/j.langsci.2014.03.004 0388-0001/Ó 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

2

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

But seeing UG as “part of the genotype” has somehow become part and parcel of generative grammar. Numerous authors have indeed claimed that intricate properties of grammar are to be considered a part of the ‘‘genetic organization of the mind’’ (Jenkins, 1979, p. 111). Certainly, within the Government and Binding (GB) era, it became standard to take UG to refer to a genetically encoded state of knowledge containing the principles of language which, given a minimum of external stimulation, give rise to this specific aspect of human cognition (Chomsky, 1980, p. 243). As Lorenzo and Longa (2009) point out, this foundational claim of GB explains the proliferation of expressions such as ‘‘blueprint’’ (Hyams, 2002: 229), ‘‘genetic endowment’’ (Lightfoot, 1982, p. 56; Haegeman, 1991, p. 12; Anderson and Lightfoot, 2002, p. 22; Guasti, 2002, p. 271), ‘‘genetic equipment’’ (Lightfoot, 1982, p. 22; Guasti, 2002, p. 18), ‘‘genetic make-up’’ (Thornton and Wexler, 1999, p. 11), ‘‘linguistic genotype’’ (Chomsky, 1980, p. 75; Lightfoot, 1982, p. 21; Hoekstra and Kooij, 1988; Lightfoot, 2006, p. 45) or ‘‘genetic program’’ (Chomsky, 1980, p. 244; Wexler, 1999; Yang, 2004) to refer to that part of our genetic endowment which is seen as responsible for the growth of grammars in the early development of children. From generic statements inspired by the ethology literature such as “children have triggering experiences that stimulate their genetic properties to develop into their phenotypic properties” (Lightfoot, 2006, p. 45), linguists quickly moved to claims like: ‘‘[A] basic tenet of this theory [the theory of UG] is that much linguistic knowledge is part of the child’s genetic makeup. This knowledge is encoded in the form of universal principles’’. (Thornton and Wexler, 1999, p. 1) Guasti (2002, p. 1) expresses herself in very similar terms: ‘‘human beings are innately endowed with a system of richly structured linguistic knowledge’’. Indeed, examples along these lines abound. Thus, when discussing the property of structure-dependence, Smith (1999, p. 173) argues that universal properties of language like such a property “have become encoded in the genes of the children”. It seems that “if innate, language must be genetic” (Uriagereka, 2007). Jerne summarized this view nicely: “It seems a miracle that young children easily learn the language of any environment into which they were born. The generative approach to grammar, pioneered by Chomsky, argues that this is only explicable if certain deep, universal features of this competence are innate characteristics of the human brain. Biologically speaking, this hypothesis of an inheritable capability to learn any language means that it must somehow be encoded in the DNA of our chromosomes. Should this hypothesis one day be verified, then linguistics would become a branch of biology.” (Jerne, 1993, p. 223) We believe that this is not the only way for linguistics to become a branch of biology. Indeed, we think that the narrower, genocentric vision of UG has proven very problematic, and in fact threatens to delay the integration and assimilation of results from linguistics into biology, which has slowly but unmistakably dropped its genocentrism (see Pigliucci and Müller, 2010 for a collection of relevant essays and perspectives; see also Fodor and Piattelli-Palmarini, 2010, and the work of Longa & Lorenzo cited in footnote1). Chomsky himself has broadened the scope of nativism by recognizing three factors that enter into language design2: 1. ‘‘Genetic endowment, apparently nearly uniform for the species, which interprets part of the environment as linguistic experience. 2. Experience, which leads to variation, within a fairly narrow range. 3. Principles not specific to the faculty of language.’’ (Chomsky, 2005, p. 6) Biologists have long recognized the role of factors of the third kind (see Seilacher’s, 1970 triangle, made famous by S. J. Gould, e.g., Gould and Lewontin, 1979; Gould, 2002), but unlike linguists, who recently have tried to ‘third-factorize’ UG, as it were, biologists are clear that nothing is really a matter of just one or two factors. It’s the three factors working in tandem (Lewontin’s, 2000 Triple Helix) that yield properties of the organism. As such, it is counterproductive to ask whether, say, language variation is the result of first, second or third factors (as Gallego, 2010 does). It is the three of them together. Still, emphasis on “the third factor” has had the benefit of highlighting the impoverished, underspecified, or minimal(ist) character of UG. To the extent that we can speak of Poverty of the Stimuli (plural intended): not just the environment (factor 2) is impoverished, so are the other factors. It is only when the three of them join forces that one can begin to reconstruct some aspects of the old vision of UG.

2 Chomsky spells out the content of this third factor by distinguishing between ‘‘(a) principles of data analysis that might be used in language acquisition and other domains; (b) principles of structural architecture and developmental constraints that enter into canalization, organic form, and action over a wide range, including principles of efficient computation, which would be expected to be of particular significance for computational systems such as language’’ (Chomsky, 2005, p. 6). It is interesting to note that (a) and (b), as used in the literature, correspond to two senses of epigenetic factors (epigenesis and epigenetics). Accordingly, if one were tempted to continue using the term UG to refer to the first factor, i.e. genetic endowment (Chomsky, 2005, pp. 1, 6), we think that a term like “epiUG” could be used to refer to the third factor. A reviewer asks if we mean to say that grammatical properties cannot be directly encoded in the genes. Yes, we mean exactly this. But we think that this position does not require special arguments in its favor. It is standard in biology: genes code for proteins, not for grammatical properties. Taking into consideration, first, the polymorphic nature of genes and, second, the fact that developmental processes depend on non-genetic factors as well, a direct link between the genotype and the phenotype is simplistic and simply untenable.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

3

We believe that this more pluralistic, interactionist way of looking at how grammatical knowledge is attained is biologically more plausible. We wish to illustrate this by highlighting the fact that once linguists began to move from a rich, overspecified vision of UG to a more impoverished vision, it has become clear that development must play a particularly important role to account for language acquisition, much as proponents of Evo-Devo (evolutionary developmental biology) came to recognize for the growth of organisms in general. This is a significant shift of perspective, as not so long ago, one could still read that “‘[o]ne simplifying assumption is that L [a particular language] is literally deducible from a choice of parametric values and lexicon, so acquisition is as if instantaneous” (Chomsky, 2000, pp. 140–141, fn. 11). The reason why we think that moving away from this idealization is progress will become clear in the next sections of the paper. 2. The locus of variation A recurrent issue in UG studies concerns the locus and the nature of language variation. This is because since Chomsky (1981), linguists have handled this in ‘parametric’ terms: they have explained it in terms of the different possible values of unfixed principles (parameters) that UG makes available. The so-called Principles-and-Parameters version of UG arguably represents the richest conception of it. As such it deserves our attention here. Over the years, several hypotheses have been entertained regarding the locus of variation. Perhaps the most well-known is the so-called Chomsky–Borer conjecture, based on the following two statements: 1. Variation is restricted to possibilities that the inflectional component makes available (Borer, 1984, p. 3). 2. Variation is restricted to the lexicon; to a narrow category of (primarily inflectional) morphological properties (Chomsky, 2001, p. 2). But there are at least two other visions or ‘conjectures’ that one can discern in the literature, though they have not been named as such. We will do so here. The first one could be called the Chomsky–Baker conjecture. This one takes the view that there are “parameters within the statements of the general principles that shape natural language syntax” (Baker, 2008), a view that arguably was the one in Chomsky’s original (1981) formulation of Principles-and-Parameters. The second other conjecture is of more recent vintage. It is one that takes variation to be confined to morphophonological variants. This is a view endorsed in Berwick and Chomsky (2011), so we will call it the Chomsky–Berwick conjecture. It corresponds to Boeckx’s (2011) Strong Uniformity Thesis: “Principles of narrow syntax are not subject to parametrization; nor are they affected by lexical parameters”. All variation is then post-syntactic. To be sustainable, the Chomsky–Berwick conjecture must rely on another conjecture, implicitly assumed for many years in the field, though rarely defended in a systematic fashion (but see Ramchand and Svenonius, 2008): there cannot be any semantic parameter. Given the transparency of the syntax–semantics mapping (so transparent that it has led some to entertain an Identity thesis; e.g., Hinzen, 2006), if there are semantic parameters, there must be syntactic parameters. But if “principles of narrow syntax are not subject to parametrization; nor are they affected by lexical parameters”, and syntax “carves the paths that semantics must blindly follow” (Uriagereka, 2008), then, it follows that in the absence of syntactic parameters, semantics will be invariant as well. In other words, the ‘no semantic parameter’ conjecture directly conflicts with the Chomsky–Baker conjecture. Quite apart from these architectural considerations, the question, of course, is which conjecture is correct, empirically. Before we adduce arguments in favor of the Chomsky–Berwick conjecture (building, in part, on previous work of ours, such as Boeckx, 2011, 2014a; Boeckx and Leivada, 2013), we would like to note that in fact the Chomsky–Borer conjecture arguably decomposed into either the Chomsky–Baker conjecture or the Chomsky–Berwick conjecture. This has to do with the evolving notion of the lexicon. In a conception of the lexicon of the sort advocated in a theoretical framework like Distributed Morphology (DM), or other realizational models like Nanosyntax, there is no lexicon in the sense familiar from traditional (generative) grammar. Instead, the tasks assigned to the component called the lexicon in earlier theories are distributed through various other components. DM assumes that syntax itself generates and manipulates an unordered hierarchy of abstract syntactic features devoid of phonological content, the so-called “morphemes” (Halle and Marantz, 1993). Once generated, such abstract feature bundles receive phonological content (a step called vocabulary insertion). Accordingly, the “inflectional component” or the “(primarily inflectional) morphological properties” referred to in Borer’s and Chomsky’s statement (1–2) could either refer to the pre-syntactic component of the lexicon (what Marantz, 1997 dubbed the narrow lexicon or List A) or the post-syntactic morphological component (the Vocabulary or List B in DM).3 If one looks at the recent literature dealing with variation, one cannot avoid being struck by the fact that numerous proposals converge in re-analyzing points of syntactic variation in morphophonological terms, uncovering a substantially greater degree of syntactic (and semantic) invariance than previously thought. Representative examples include AcedoMatellán (2010), Jenks (2012), Real-Puigdollers (2013), and Safir (2014). The upshot of these works (to which we could add

3 Based on our discussion of Borer (1984) in Boeckx (2014a), we are inclined to believe that Borer (1984) actually sided with (a version of) the Chomsky– Berwick conjecture, whereas Chomsky (2001) sided against the Strong Uniformity Thesis, and therefore with (a version of) the Chomsky–Baker conjecture.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

4

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

models like the one in Richards, 2010 or Boeckx, in press-b) is that (morpho-)phonology has a much larger impact on the final form of a linguistic utterance that is generally thought.4 Even empirical considerations, then, point to the existence of an invariant syntactic component (Boeckx, in press-b takes this as an argument for a feature-free, Merge-only syntax model), with variation confined at the margin, in the externalization component. In the literature on variation we are aware of only a few works explicitly denying that all points of variation are reducible to morphophonological decisions. Two such arguments offered in Roberts (2010a, 2011a). The arguments have to do with (i) the alleged absence of a non-stipulative way to rule out the existence of syntactic parameters, and (ii) the inability of PF ‘parameters’ to give rise to (parametric) hierarchies. Both arguments, however, turn out to be problematic on closer scrutiny. Let us start with the argument based on the (in)ability to give rise to (parametric) hierarchies. The classical notion of ‘parameter’ was indeed meant to be used to make certain predictions with respect to the existence of specific parametric paths, guiding language acquisition, for instance, along the lines of the ones presented in Baker (2003). According to such models, UG encapsulates an ordered representation of parameters, making available certain hierarchies that start off with a non-dependent parameter (e.g., the Polysynthesis Parameter in Baker, 1996). Obviously, these top parameters have to be set first since their setting has an impact on the setability (i.e. whether a parameter is relevant/realizable or not: For example, setting Polysynthesis to ‘yes’ would render Head Directionality non-settable/-realizable in Baker, 2003) of the dependent parameters that follow. As Baker (2005) remarks, “an efficient learner should learn in a structured way in which some parameters are entertained first and others later” (p. 95, emphasis added). The problem for such a view is not logical (reducing the cost of acquisition would be a great feature), but as discussed in Boeckx and Leivada (2013), when a realistic number of parameters and parametric dependencies are taken into account (e.g., the model of Longobardi and Guardiano, 2009), the deterministic nature of the paths defined by UG disappears, revealing a complex, subway-map-like network that cannot be assumed to guide the learner in any straightforward way. But aside from this, Roberts’s argument that PF ‘parameters’ cannot give rise to (parametric) hierarchies is flawed because even for him there is nothing properly syntactic to construct the hierarchies. As Roberts himself has made explicit in a number of works, parametric hierarchies are “emergent properties” (Roberts and Holmberg, 2009; Roberts, 2011b). They are constructed on the basis of markedness principles, crucially not syntactic principles. As Roberts (2011b) states, markedness principles are “not grammatical principles but acquisition strategies” that derive from third factor considerations (see also Boeckx, 2011). If so, these principles could ‘structure’ variation (giving rise to hierarchies) in the PF-component.5 As for Roberts’s second argument in favor of syntactic variation (the alleged absence of a non-stipulative way to rule it out), we remain unconvinced. For Roberts, ‘PF parameters’ are symmetrical and a parameter P is a non-PF one if the realized variation defined by P contains a (typological) gap (Roberts cites the example of word order parameter as a case of nonsymmetric parameter, to which we return). Notice right away that this characterization is not derived by Roberts. That is, it’s a stipulative way of defining PF and non-PF parameters. In other words, it’s a stipulative way to rule in the existence of syntactic parameter. Why couldn’t PF parameters give rise to typological gaps, given the existence of typological gaps in uncontroversially phonological phenomena? Empirically, it is not even clear that Roberts’s example based on word order achieves what he wants it to. Roberts notes that a seemingly symmetric parameter like head-final/head-initial gives rise to asymmetries such as the Final-over-Final Constraint (FOFC; Holmberg, 2000) that make the parameter syntactic. FOFC is defined as in (3) and predicts the absence of (4). Roberts (2010b) calls FOFC the ‘signature asymmetry’ that shows the word-order/linearization parameter to be a nonPF parameter. 3. FOFC: A head-final phrase cannot immediately dominate a head-initial phrase in a single extended projection. *[X’ [YP Y ZP] X] (adapted from Biberauer, 2011, ex. (12d)). Exploring, however, the predictions of FOFC across different languages, one observes that, contra (3), head-final VPs may immediately dominate head-initial PPs in verb-second languages, as Biberauer et al. (2007) also acknowledge. This observation rendered the following modification (4) of FOFC necessary: 4. If a phase head PH has an EPP feature, all the heads in its complement domain with which it agrees in categorial features must have an EPP feature. *[V’ [VP V OB] V] (Biberauer et al., 2007)

4 Even when variation is not explicitly dubbed ‘morphophonological’ but instead syntactic, a closer look of the respective proposals suggests otherwise. Thus, Gallego (2007) aims to offer a syntactic, phasehood-informed account of variation, however his summary of two of the three main points of his account makes explicit reference mainly to morphology and not to syntax: (i) “there is a correlation between C and v (the phase heads) in terms of morphological richness that boosts their Left Periphery”, (ii) “some varieties of verb movement [.] manifest domain extension effects that strongly indicate a syntactic status, triggering the application of a morphologically motivated Transfer, referred to as Phase Sliding” (p. 380, emphasis added). 5 A reviewer points out that we do not comment on the fact that for Roberts parametric hierarchies emerge from the interaction of Chomsky’s three factors (rather than from markedness/third factor principles only). The precise nature of interaction of the three factors is not made clear in Roberts’ work. All we can gather from that work is that hierarchies are not given by UG/first factor, which is why the respective hierarchies are called ‘emergent’. Since Roberts does not seem to rely on properties of the environment/the second factor in formulating his ‘parameter-hierarchy’ building principles, we think it is safe to attribute these to the third factor (e.g., markedness principles).

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

5

Fig. 1. A learning algorithm (Bazalgette, 2013).

Yet this formulation is not immune to counterexamples either. More specifically, ‘231’ modal2–verb3–auxiliary1 sequences in Afrikaans, Zürich German, and West Flemish (discussed in Biberauer, n.d.) correspond to head-final, head-initial, and headfinal patterns for VP, ModP, and AuxP respectively and they therefore give rise to FOFC-violations. This is an important observation if FOFC is portrayed as the signature asymmetry that provides evidence against the Chomsky–Berwick conjecture. If the allegedly robust asymmetries can be violated, it is not clear what conclusion one can draw from FOFC. In line with the proposal put forth in this section, FOFC constraints have been approached in PF terms in Sheehan (2013, in press) as structures that cannot be linearized by the Linear Correspondence Axiom (Sheehan’s modification of Kayne’s, 1994 Linear Correspondence Axiom) at PF and by Etxepare and Haddican (2013) who suggest that Basque verb clusters favor a PF-based analysis of FOFC effects as opposed to a syntactic one. Either one of these options are compatible with the Chomsky–Berwick conjecture, which reinforces our belief that the sole locus of grammatical variation lies in the PF component. 3. In the absence of parameters The growing evidence that variation is (i.e. can be analyzed as) confined to the morphophonological component, coupled with the empirical and conceptual problems of the classical notion of parameter (Boeckx, 2011, 2014a; Newmeyer, 2005), suggests that, though constrained, variation in language is not ruled by parameters. It is clear that in the absence of UG-defined parameters and acquisition paths, variation must be structured, and language acquisition guided by a variety of factors. In this section we would like to sketch the properties of a learning algorithm that does not rely on an overspecified UG. Our proposal is in line with the models put forth in Bazalgette (2013; Fig. 1) and Biberauer et al. (2013; Fig. 2). Although these proposals are on the right track in the sense of approaching acquisition without assuming a process of fixing the values of UG-encoded parameters, they suffer from not being explicit on other counts. More specifically, a systematic attempt to maintain a non-overspecified UG that has no parameters cannot assume other pieces of innate knowledge without explaining why exactly this knowledge remains as a property of UG or how it surfaces in the acquisition task as a different type of (possibly third factor) property of the efficient learner. The algorithm in Fig. 1, for example, presupposes that the learner somehow knows how to distinguish the largest unconsidered natural class of heads, but it is not clear where this knowledge comes from or what exactly it corresponds to. Also, it is not clear how the division into þF and F takes place in terms of how much noise/exceptions to a hypothesized rule the learner tolerates. Similarly, the hierarchy in Fig. 2 presupposes that the learner knows how to distinguish heads from non-heads, V heads from other types of heads, and how to apply the new feature selectively. There is a fair amount of knowledge taken for granted (which should not be given by UG, since this is a non-UG-specified, emergent hierarchy) and the learning schema is not explicit on the origins of this knowledge and the cues that the efficient learner uses in order to perform the listed tasks. We propose that a learning algorithm should in the absence of UG-provided acquisition cues: a) account for the productivity of the hypothesized rules b) integrate a parsing component that distinguishes between ambiguous data and unambiguous cues (Fodor, 1998) Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

6

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

Fig. 2. A non-UG-specified, emergent parameter hierarchy.^signifies ‘comp-to-spec’ movement (Biberauer et al., 2013).

Table 1 List of biases and factors that aid acquisition. Name

Description

(A) Reasoning under uncertainty (based on Bayesian models of learning; e.g. Kemp et al., 2007) (B) Superset Bias (Boeckx, 2011) (C) Bias/Variance Trade-off (Briscoe and Feldman, 2011)

Integrate conflicting tendencies in the process of learning through simultaneous entertaining of both overhypotheses as well as constraints on hypotheses Strive for value consistency Adopt an intermediate point on the bias/variance continuum. Do so by keeping (B) a bias, not a principle, in order to avoid backtracking Analyze datum s through a hypothesized grammar Gi with the probability pi. Depending on being successful, punish or reward Gi by decreasing and increasing pi Based on (D), turn Gi into a rule. Assume a Rule R is productive if T(N,M) < T(N,N)

(D) Statistical Ability (e.g., Yang, 2002, 2010) (E) Tolerance Principle (Yang, 2005; Legate and Yang, 2012) (F) Elsewhere Condition (Anderson, 1969) (G) PF-Cues Sensitivity (cf. Fasanella and Fortuny’s, 2011 Accessibility Condition) (H) Perception and Memory Constraints (Endress et al., 2009; Gervain and Mehler, 2010)

Following (E), once multiple candidates are available, apply the most specific rule Fix points of variation on the basis of explicit, saliently accessible morphophonological cues. Make use of prosodic cues to define constituents Keep track of sequence edges which are particularly salient positions in facilitating learning/giving rise to either word-initial or word-final processes much more often than otherwise

c) tolerate exceptions also by taking into account computing time of rule-application vs. ‘exceptions list’-parsing (see Legate and Yang, 2012) d) determine which biases aid the learning process in its initial stages without assuming that the learner is already able to understand heads from non-heads or other syntactic notions In this context, we suggest that the acquisition process relies on a variety of factors, some of which are informed by processes also relevant in other modules of human cognition, hence fall within the third factor domain. The first principle is the ability for ‘reasoning under uncertainty’. Bayesian Networks are considered as one of the most prominent frameworks for ‘reasoning under uncertainty’ and the majority of tasks requiring intelligent behavior have some degree of uncertainty implicated (see e.g., Gyftodimos and Flach, 2004), which is also the case observed in language learning. A key characteristic of this reasoning is the ability to entertain overhypotheses and constraints on hypotheses at the same time. As Kemp et al. (2007) show, inductive learning is not possible without both overhypotheses and constraints on them entertained by the learner. Kemp et al. (2007) accept the innate nature of some of these hypotheses, however they argue that Hierarchical Bayesian models can help to explain how the rest of the hypotheses are acquired. Establishing the parallelism with acquisition, the efficient learner should be able to integrate in the process of learning some conflicting tendencies, such as the need to formulate generalizations over input, without however making the acquisition task more burdensome via forming assumptions that may be later hard to retract from. More specifically, the efficient learner internalizes linguistic knowledge by making use of biases that simultaneously allow for both overgeneralizing hypotheses (Boeckx’s, 2011 Superset Bias), but also for adequately constraining overgeneralizations, in line with Briscoe and Feldman’s (2011) Bias/Variance Trade-off, according to which learners adopt an intermediate point on the bias/variance continuum in order to refrain from overfitting, backtracking and reanalyzing data. Another property of the efficient learner is the ability to pay attention to statistical devices. Many studies point out that humans are powerful statistical learners (e.g., Saffran et al., 1996). Yang (2005) suggests that productivity of hypothesized rules is subject to the Tolerance Principle, which seeks to define how many exceptions to a hypothesized rule can be tolerated without the learner deciding to abandon the rule as unproductive. One of the more recent formal representations of the Tolerance Principle holds that Rule R is productive if T(ime)(N,M) < T(N,N), with (N–M) being the rule-following items and M the exceptions (Yang, 2005; Legate and Yang, 2012). If T(N,N) < T(N,M), then R is not productive and all items are listed as exceptions (M ¼ N, all items are stored as exceptions). Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

7

Fig. 3. A learning algorithm based on the components of Table 1.

This principle accurately predicts rule productivity and inference in the course of acquisition in terms of overhypotheses formulated by the learner. Paying attention to morphophonological cues is the third characteristic of the acquisition algorithm we propose. Prosody, for example, defines constituents and aids identifying position/edges of syntactic representation (Endress and Hauser, 2010). We argue that subtle points of variation (i.e. what would be referred to as ‘microparameters’ in parametric approaches) should be set on the basis of explicit, saliently accessible morphophonological cues.6 There are at least two more types of third factor principles that aid learning: first, the Elsewhere Condition (Anderson, 1969) according to which the learner applies the most specific rule when multiple candidates are possible. Second, perception and memory constraints of the sort described in Endress et al. (2009) and Gervain and Mehler (2010) also carry an important role. Endress et al. (2009) juxtapose the prevalence of prefixing and suffixing across languages with the rarity of infixing in terms of a memory constraint according to which sequence edges are particularly salient positions, facilitating learning and giving rise to either word-initial or word-final processes much more often than otherwise. Following Yang (2002, pp. 26–27, 2010, p. 1162) in assuming that the child upon receiving datum s selects a grammar Gi with the probability pi and depending on being successful in analyzing s with Gi, punishes or rewards Gi by decreasing and

6 This property is reminiscent of the Accessibility Condition, defined by Fasanella and Fortuny (2011) as a condition according to which “parameters must be set by directly inspecting phonological and morphological properties of utterances”. It is also reminiscent of Chomsky’s (2001) Uniformity Hypothesis: “In the absence of compelling evidence to the contrary, assume languages to be uniform, with variety restricted to easily detectable properties of utterances”.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

8

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

increasing pi respectively, the acquisition process corresponds to a learning algorithm that integrates the following principles7: Table 1 presents the list of relevant biases in an unordered fashion. Put differently, the relevant factors are identified but they are not ordered, related to each other and eventually integrated in the form of a learning schema which is what happens in Fig. 1. Ordering them gives rise to following algorithm that approaches the acquisition process from the very beginning8: (Fig. 3) 4. Concluding remarks In this paper we have focused on how (not) to think about UG. Scholars outside the generative school are prone to misunderstanding and mis-representing claims about UG (witness the well-known case of Everett, 2005, among many other examples we could produce). But we have argued that generative grammarians themselves have been guilty of characterizations of UG that are logically sound, but biologically implausible. Thinking of UG in strictly genetic terms is seriously misguided. Likewise, thinking of UG as substantially structuring variation seems to us to be on the wrong track. It is of little use to put forth a rich UG as an attempt to reduce the acquisition task, if the UG primitives (i.e. parameters and the hierarchies they give rise to) eventually run into empirical problems of the sort identified in Boeckx and Leivada (2013). As argued there, the picture that emerges from current parametric models is that of a complex, subway-map-like network (see Rigon, 2009, p. 221 for a schematic representation) that cannot be argued to guide the learner in any straightforward way. We have reviewed work suggesting that variation is confined to the process of externalization, and far less structured by UG than commonly assumed. If points of variation across languages reduce to realizational variants, the process of acquisition cannot correspond to triggering innately specified values of UG-encoded parameters. Instead, the learner integrates a variety of different factors in the process of learning. We have suggested an algorithm that keeps the assumptions about the initial state of the language faculty at a minimum. It is unclear to us why, given its well-known flaws, the rich UG model of Principles-and-Parameters still holds sway of the field. Much like the Broca–Wernicke model in psycho-/neuro-linguistics, and the Modern Synthesis in biology, the Principles-and-Parameters model proved too simplistic, and ought to give way to a more eclectic, multi-factorial model. “Third-factorizing” UG need not lead us to abandon the belief that there are constraints on what can and can’t be a possible grammar (the true function of UG, contrary to many scholars who think that it provides a Grammar), but we will have to learn to live with the fact that traditional lines of argument, such as the POS, are blunt argumental knives: they do not cut across the many factors involved in a neat fashion. But we certainly don’t want to minimize the impact of work within the Principles-and-Parameters tradition. Its great virtue is to have been proven wrong. It’s been an inspiring error that led to the truly astonishing hypothesis that syntax is invariant, symmetric under morphophonological variation. Acknowledgments We thank the two anonymous reviewers for comments that helped remove unclear formulations from the original manuscript. The present work was made possible through a Marie Curie International Reintegration Grant from the European Union (PIRG-GA-2009-256413), research funds from the Fundació Bosch i Gimpera, and a grant from the Spanish Ministry of Economy and Competitiveness (FFI-2010-20634). References Acedo-Matellán, V., 2010. Argument Structure and the Syntax–Morphology Interface. A Case Study in Latin and Other Languages (Doctoral dissertation). Universitat de Barcelona. Anderson, S.R., Lightfoot, D., 2002. The Language Organ. Linguistics as Cognitive Physiology. Cambridge University Press, Cambridge. Anderson, S., 1969. West Scandinavian Vowel Systems and the Ordering of Phonological Rules (Doctoral dissertation). MIT. Baker, M., 1996. The Polysynthesis Parameter. Oxford University Press, New York. Baker, M., 2003. Linguistic differences and language design. Trends Cogn. Sci. 7, 349–353. Baker, M., 2005. Mapping the terrain of language acquisition. Lang. Learn. Dev. 1, 93–129. Baker, M., 2008. The macroparameter in a microparametric world. In: Biberauer, T. (Ed.), The Limits of Syntactic Variation. John Benjamins, Amsterdam, pp. 351–373. Bazalgette, T., 2013. An Algorithm for Lexicocentric Parameter Acquisition. Paper presented in GLOW 36: Biolinguistics workshop. Berwick, R.C., Chomsky, N., 2011. The biolinguistic program: the current state of its development. In: Di Sciullo, A.M., Boeckx, C. (Eds.), The Biolinguistic Enterprise: New Perspectives on the Evolution and Nature of the Human Language Faculty. Oxford University Press, Oxford, pp. 19–41. Biberauer, T., Holmberg, A., Roberts, I., 2007. Structure and Linearization in Disharmonic Word Orders. Paper presented at the XVII Colloquium of Generative Grammar, Girona. Biberauer, T., Roberts, I., Sheehan, M., 2013. Mafioso Parameters and the Limits of Syntactic Variation. Paper presented at Towards a Theory of Syntactic Variation, Bilbao.

7 A reviewer questions our use of Yang’s model because in its original formulation, this model relied on the existence of parameters, and we don’t. We think that the central features of Yang’s model that we use here do not require parameters in the classical sense to exist. Rules could equally be used to compute the relevant probabilities over. 8 It is important to note that although our algorithm as a whole stills needs to be put to the test, all its ingredients have been shown to help the acquisition process: e.g., the experiments reported in Endress et al., 2009 on the role of perception and memory functions in language processing in the course of language acquisition or Yang’s (2005) assessment on a mathematical basis (through the Tolerance Principle) of the productivity of rules related to plural formation in German.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

9

Biberauer, T., 2011. In Defence of Lexico-centric Parametric Variation: Two 3rd Factor-constrained Case Studies. Paper presented at the Workshop on Formal Grammar and Syntactic Variation: Rethinking Parameters, Universidad Autònoma de Madrid. Biberauer, T., n.d. Disharmonic Word Order, Quirky Morphology and the Afrikaans Verb Cluster (Ms.). University of Cambridge & Stellenbosch University. Retrieved from: http://www.hum.au.dk/engelsk/engsv/nyvad-abstracts/biberauer-ho.pdf. Boeckx, Cedric, 2011. Approaching parameters from below. In: Di Sciullo, A.M., Boeckx, C. (Eds.), The Biolinguistic Enterprise: New Perspectives on the Evolution and Nature of the Human Language Faculty. Oxford University Press, Oxford, pp. 205–221. Boeckx, C., 2014a. What principles & parameters got wrong. In: Picallo, C. (Ed.), Linguistic Variation and the Minimalist Program. Oxford University Press, Oxford. Boeckx, C., 2014b. Elementary Syntactic Structures. Cambridge University Press, Cambridge (in press-b). Boeckx, C., Longa, V.M., 2011. Lenneberg’s views on language development and evolution and their relevance for modern biolinguistics. Biolinguistics 5 (3), 254–273. Boeckx, C., Leivada, E., 2013. Entangled parametric hierarchies: problems for an overspecified universal grammar. PLoS ONE 8 (9), e72357. Borer, H., 1984. Parametric Syntax: Case Studies in Semitic and Romance Languages. Foris, Dordrecht. Briscoe, E., Feldman, J., 2011. Conceptual complexity and the bias/variance tradeoff. Cognition 118 (1), 2–16. Chomsky, N., 1959. A review of B. F. Skinner’s verbal behavior. Language 35, 26–58. Chomsky, N., 1967. Recent contributions to the theory of innate ideas. Synthese 17, 2–11. Chomsky, N., 1975. Reflections on Language. Pantheon, New York. Chomsky, N., 1980. Rules and Representations. Columbia University Press, New York. Chomsky, N., 1981. Lectures on Government and Binding. Foris, Dordrecht. Chomsky, N., 2000. Minimalist inquiries: the framework. In: Martin, R., Michaels, D., Uriagereka, J. (Eds.), Step by Step. Essays in Honor of Howard Lasnik. MIT Press, Cambridge, MA, pp. 89–155. Chomsky, N., 2001. Derivation by phase. In: Kenstowicz, Michael (Ed.), Ken Hale: a Life in Language. MIT Press, Cambridge, MA, pp. 1–52. Chomsky, N., 2005. Three factors in language design. Linguist. Inq. 36, 1–22. Endress, A.D., Hauser, M.D., 2010. Word segmentation with universal prosodic cues. Cogn. Psychol. 61 (2), 177–199. Endress, A.D., Nespor, M., Mehler, J., 2009. Perceptual and memory constraints on language acquisition. Trends Cogn. Sci. 13 (8), 348–353. Etxepare, R., Haddican, B., 2013. Repairing Final-over-Final Constraint Violations: Evidence from Basque Verb Clusters. Paper presented at GLOW 36. Everett, D.L., 2005. Cultural constraints on grammar and cognition in Pirahã: another look at the design features of human language. Curr. Anthropol. 46, 621–646. Fasanella, A., Fortuny, J., 2011. Deriving Linguistic Variation from Learnability Conditions in a Parametric Approach to UG. Paper presented at the Workshop on Formal Grammar and Syntactic Variation: Rethinking Parameters, Universidad Autònoma de Madrid. Fodor, J.D., 1998. Unambiguous triggers. Linguist. Inq. 29 (1), 1–36. Fodor, J.A., Piattelli-Palmarini, M., 2010. What Darwin Got Wrong. Farrar, Straus and Giroux, New York. Gallego, Á., 2007. Phase Theory and Parametric Variation (Doctoral dissertation). Universitat Autònoma de Barcelona. Gallego, Á., 2010. Parameters. In: Boeckx, C. (Ed.), The Oxford Handbook of Linguistic Minimalism. Oxford University Press, Oxford, pp. 523–550. Gervain, J., Mehler, J., 2010. Speech perception and language acquisition in the first year of life. Annu. Rev. Psychol. 61, 191–218. Gould, S.J., 2002. The Structure of Evolutionary Theory. Harvard University Press, Cambridge, MA. Gould, S.J., Lewontin, R.C., 1979. The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. Lond. Ser. B Biol. Sci. 205, 581–598. Guasti, M.T., 2002. Language Acquisition. The Growth of Grammar. MIT Press, Cambridge, MA. Gyftodimos, E., Flach, P.A., 2004. Hierarchical Bayesian networks: an approach to classification and learning for structured data. In: Vouros, G.A., Panayiotopoulos, T. (Eds.), SETN 2004, LNAI 3025. Springer-Verlag, Berlin, pp. 291–300. Haegeman, L., 1991. Introduction to Government and Binding Theory. Blackwell, Oxford. Halle, M., Marantz, A., 1993. Distributed morphology and the pieces of inflection. In: Hale, K., Keyser, S.J. (Eds.), The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger. MIT Press, Cambridge, MA, pp. 111–176. Hinzen, W., 2006. Mind Design and Minimal Syntax. Oxford University Press, Oxford. Hoekstra, T., Kooij, J.G., 1988. The innateness hypothesis. In: Hawkins, J.A. (Ed.), Explaining Language Universals. Blackwell, Oxford, pp. 31–55. Holmberg, A., 2000. Deriving OV order in Finnish. In: Svenonius, P. (Ed.), The Derivation of VO and OV. John Benjamins, Amsterdam, pp. 123–152. Hyams, N., 2002. Clausal structure in child Greek: a reply to Varlokosta, Vainikka and Rohrbacher and a reanalysis. Linguist. Rev. 19 (3), 225–269. Jenkins, L., 1979. The genetics of language. Linguist. Philos. 3, 105–119. Jenks, P., 2012. Definite Spans and Blocking in Classifier Languages (Ms.). UC Berkeley. Jerne, N.K., 1993. The generative grammar of the immune system. Nobel Lecture, December 8 1984. In: Lindsten, J. (Ed.), Nobel Lectures: Physiology or Medicine 1981-1990. World Scientific, Singapore, pp. 211–225. Kayne, R.S., 1994. The Antisymmetry of Syntax. MIT Press, Cambridge, MA. Kemp, C., Perfors, A., Tenenbaum, J.B., 2007. Learning overhypotheses with hierarchical Bayesian models. Dev. Sci. 10 (3), 307–321. Legate, J.A., Yang, C., 2012. Assessing child and adult grammar. In: Piattelli-Palmarini, M., Berwick, R.C. (Eds.), Rich Languages from Poor Inputs. Oxford University Press, Oxford, pp. 168–182. Lewontin, R.C., 2000. The Triple Helix: Gene, Organism, and Environment. Harvard University Press, Cambridge, MA. Lightfoot, D., 1982. The Language Lottery: Toward a Biology of Grammars. MIT Press, Cambridge, MA. Lightfoot, D., 2006. How New Languages Emerge. Cambridge University Press, Cambridge. Longa, V.M., Lorenzo, G., 2008. What about a (really) minimalist theory of language acquisition? Linguist. Interdiscip. J. Lang. Sci. 46 (3), 541–570. Longa, V.M., Lorenzo, G., 2012. Theoretical Linguistics meets development. Explaining FL from a epigeneticist point of view. In: Boeckx, C., Horno, M.C., Mendívil, J.L. (Eds.), Language, from a Biological Point of View: Current Issues in Biolinguistics. Cambridge Scholars Publishing, Cambridge, pp. 52–84. Longobardi, G., Guardiano, C., 2009. Evidence for syntax as a signal of historical relatedness. Lingua 119, 1679–1706. Lorenzo, G., Longa, V.M., 2009. Beyond generative geneticism: rethinking language acquisition from a developmentalist point of view. Lingua 119, 1300–1315. Marantz, A., 1997. No escape from syntax: don’t try morphological analysis in the privacy of your own lexicon. In: Dimitriadis, A., Siegel, L., Surek-Clark, C., Williams, A. (Eds.), Proceedings of the 21st Annual Penn Linguistics Colloquium. University of Pennsylvania, Philadelphia, pp. 201–225. Newmeyer, F.J., 2005. Possible and Probable Languages. Oxford University Press, Oxford. Piattelli-Palmarini, M., Berwick, R.C. (Eds.), 2013. Rich Languages from Poor Inputs. Oxford University Press, Oxford. Pigliucci, M., Müller, G.B. (Eds.), 2010. Evolution d the Extended Synthesis. MIT Press, Cambridge, MA. Ramchand, G., Svenonius, P., 2008. Mapping a parochial lexicon onto a universal semantics. In: Biberauer, T. (Ed.), The Limits of Syntactic Variation. John Benjamins, Amsterdam, pp. 219–245. Real-Puigdollers, C., 2013. Lexicalization by Phase. The Role of Prepositions in Argument Structure and its Cross-linguistic Variation (Doctoral dissertation). Universitat de Barcelona. Richards, N., 2010. Uttering Trees. MIT Press, Cambridge, MA. Rigon, G., 2009. A Quantitative Approach to the Study of Syntactic Evolution (Doctoral dissertation). Università di Pisa. Roberts, I., 2010a. On the Nature of Syntactic Parameters: a Programme for Research. Paper presented at the 2010 Mayfest on Bridging Typology and Acquisition. Roberts, I., 2010b. Rethinking Comparative Syntax (Project No. 269752. Ms.). University of Cambridge. Retrieved from: http://www-falcon.csx.cam.ac.uk/ site/RECOS/proposal/full-proposal.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004

10

C. Boeckx, E. Leivada / Language Sciences xxx (2014) 1–10

Roberts, I., 2011a. Parametric Hierarchies: Some Observations. Paper presented at the Workshop on Formal Grammar and Syntactic Variation: Rethinking Parameters, Universidad Autònoma de Madrid. Roberts, I., 2011b. Parameters as Emergent Properties. Paper presented at Université Paris Diderot. Slides retrieved from: http://www.linguist.univ-parisdiderot.fr/PDF/linglunch/slidesLingLunch-Roberts201101.pdf. Roberts, I., Holmberg, A., 2009. Introduction: parameters in minimalist theory. In: Biberauer, T., Holmberg, A., Roberts, I., Sheehan, M. (Eds.), Parametric Variation: Null Subjects in Minimalist Theory. Cambridge University Press, Cambridge, pp. 1–57. Saffran, J.R., Aslin, R.N., Newport, E.L., 1996. Statistical learning by 8-month-old infants. Science 274, 1926–1928. Safir, K., 2014. One true anaphor. Linguist. Inq. 45 (1), 91–124. Seilacher, A., 1970. Arbeitskonzept zur Konstruktions-morphologie. Lethaia 3, 393–396. Sheehan, M.L., 2013. Some implications of a copy theory of labeling. Syntax. http://dx.doi.org/10.1111/synt.12010. Sheehan, M.L., 2014. The final-over-final constraint and the head-final filter. In: Biberauer, T., Holmberg, A., Roberts, I., Sheehan, M. (Eds.), The Final-overFinal Constraint: a Word-Order Universal and its Implications for Linguistic Theory. MIT Press, Cambridge, MA (in press). Skinner, B.F., 1957. Verbal Behavior. Appleton-Century-Crofts, New York. Smith, N., 1999. Chomsky. Ideas and Ideals. Cambridge University Press, Cambridge. Thornton, R., Wexler, K., 1999. Principle B, Ellipsis, and Interpretation in Child Grammar. MIT Press, Cambridge, MA. Uriagereka, J., September 25, 2007. The evolution of language. What songbirds, dancing, and knot-tying can tell us about why we speak. Seed Mag. http://www.seedmagazine.com. Uriagereka, J., 2008. Syntactic Anchors. Cambridge University Press, Cambridge. Wexler, K., 1999. Maturation and growth of grammar. In: Ritchie, W.C., Bhatia, T.K. (Eds.), Handbook of Language Acquisition. Academic Press, San Diego, CA, pp. 55–109. Yang, C., 2002. Knowledge and Learning in Natural Language. Oxford University Press, Oxford. Yang, C., 2004. Toward a theory of language growth. In: Jenkins, L. (Ed.), Variation and Universals in Biolinguistics. Elsevier, Amsterdam, pp. 37–56. Yang, C., 2005. On productivity. Linguist. Var. Yearb. 5, 265–302. Yang, C., 2010. Three factors in language variation. Lingua 120, 1160–1177.

Please cite this article in press as: Boeckx, C., Leivada, E., On the particulars of Universal Grammar: implications for acquisition, Language Sciences (2014), http://dx.doi.org/10.1016/j.langsci.2014.03.004