Embedding (im)plausible clauses in propositional attitude contexts: Modulatory effects on the N400 and late components

Embedding (im)plausible clauses in propositional attitude contexts: Modulatory effects on the N400 and late components

Journal of Neurolinguistics 53 (2020) 100877 Contents lists available at ScienceDirect Journal of Neurolinguistics journal homepage: www.elsevier.co...

2MB Sizes 0 Downloads 24 Views

Journal of Neurolinguistics 53 (2020) 100877

Contents lists available at ScienceDirect

Journal of Neurolinguistics journal homepage: www.elsevier.com/locate/jneuroling

Embedding (im)plausible clauses in propositional attitude contexts: Modulatory effects on the N400 and late components

T

Lia Călinescu1, Anna Giskes1, Mila Vulchanova, Giosuè Baggio∗ Language Acquisition and Language Processing Lab, Department of Language and Literature, Norwegian University of Science and Technology, Trondheim, Norway

ARTICLE INFO

ABSTRACT

Keywords: Semantics Language processing Propositional attitudes Intensionality N400 EEG ERP Norwegian

How do comprehenders track dependencies between the plausibility of embedded clauses and of the sentences in which they occur? We investigated processing of Norwegian propositional attitude sentences with plausible or implausible complement clauses (‘Magnus {knows/believes/ dreams/doubts/imagines} that mosquitos live off {blood/vodka}’). Using ERPs, we tested the hypothesis that the amplitude of the N400 component is sensitive to the plausibility of the attitude sentence as a whole, as opposed to that of the complement clause. If the hypothesis is correct, the N400 should be larger for implausible clauses (vodka > blood) under know and believe, it should be canceled (vodka ~ blood) under dream, and it should be reversed (vodka < blood) under doubt and imagine. These predictions were not borne out. We found N400 effects for all attitude sentences, tracking the plausibility of complement clauses (vodka > blood), except for imagine (‘innbille’ in Norwegian has a counterfactual sense), where the effect was suppressed, but not reversed. Moreover, we observed modulations of late ERP components for implausible clauses (vodka > blood) under doubt and imagine, i.e., when the plausibility of the attitude sentence requires an implausible complement clause. We propose that (a) the N400, in general, reflects processing of semantic relations in phrasal and clausal contexts, and it is not necessarily sensitive to sentence plausibility, and (b) later components index integration of information from embedded clauses into a sentence model.

1. Introduction Sentences about what people know, believe, doubt, expect etc. are common in most forms of discourse, from ordinary conversation and the news to literary and scientific texts. In linguistics and philosophy, such statements are known as propositional attitude reports: sentences of the form ‘X V that p’ (e.g., ‘Mary doubts that Pluto is a true planet’), where X is the subject of the attitude verb V, and p is a complement clause expressing the attitude's object. In particular, philosophers have focused on questions concerning the nature of the objects of the propositional attitudes, i.e., whether those are propositions, sentences, sets of possible worlds, structured mental representations etc. (Asher, 1986; Barwise & Perry, 1981; Carnap, 1947; Fodor, 1978; Harman, 1972; Hintikka, 1969; Lewis, 1986; Stalnaker, 1984). Semanticists have examined how particular attitude verbs induce ‘opaque contexts’, e.g., why ‘X believes p’ does not entail ‘X believes q’, for arbitrary equivalent p and q (Partee, 1979; Jackendoff, 1983, 1985). Syntacticians have

∗ Corresponding author. Department of Language and Literature, Norwegian University of Science and Technology, NO-7491, Trondheim, Norway. E-mail address: [email protected] (G. Baggio). 1 Equal contributions; listed in alphabetical order.

https://doi.org/10.1016/j.jneuroling.2019.100877 Received 14 April 2019; Received in revised form 15 September 2019; Accepted 22 October 2019 Available online 03 November 2019 0911-6044/ © 2019 Elsevier Ltd. All rights reserved.

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

investigated the varieties and formal properties of clausal complements (den Dikken, Larson, & Ludlow, 2016), partly in relation to semantics (Portner, 1997; White, Hacquard, & Lidz, 2018). However, despite the relative prevalence of attitude reports in human discourse, and of rich theoretical understanding of their logico-linguistic properties, little is known about the on-line processing of these and related constructions (Delogu, Vespignani, & Sanford, 2010; Hackl, Koster-Moeller, & Gottstein, 2009). Our goal is to contribute to filling this gap by reporting the first study on propositional attitudes using event-related potentials (ERPs). Specifically, we address a key issue raised by intensional verbs (e.g., believe, doubt, dream, imagine etc.): the relations between the plausibility of the matrix clause (i.e., the attitude report: ‘X V that p’) and the plausibility of the complement clause (p). In line with much current work in psycholinguistics, we define plausibility as the quality of propositions, sentences, and clauses of seeming probable, that is, relatively likely to be true (Gibson & Pearlmutter, 1998; Jurafsky, 2003; Padó, Crocker, & Keller, 2009; see below for further refinements). Conversely, implausibility is the quality of seeming improbable, i.e., relatively unlikely to be true. Plausibility and implausibility depend largely on knowledge of the world, not on knowledge of language (Pylkkänen, Llinás, & McElree, 2004). Every intensional verb raises expectations concerning the plausibility (or truth) of the proposition(s) expressed by the complement clause (see further for a linguistically-motivated analysis). In normal circumstances, some attitude reports may be plausible to the extent that their complements are plausible. Here, belief is a case in point. Stalnaker (1984) analyzed belief as an ‘attitude of acceptance’, i.e., one which is correct whenever its complement is true. Other attitudes show, instead, the opposite relation, such that a report is plausible to the extent that its object is implausible. Doubt is an instance of ‘attitude of rejection’, one which is correct whenever its complement is false. This symmetry between belief and doubt is the main theoretical premise of our study. It has been widely discussed in both philosophy and linguistics. For example, Asher (1987) has observed that “The mental act of doubting the proposition that ϕ consists in, roughly, admitting as real doxastic possibilities only those situations that do not make ϕ true or that make ϕ highly unlikely or implausible. Thus, a doubt is something like an ‘implicit’ belief to the contrary” (p. 161). The present focus on plausibility allows us to use the N400 in ERPs as the main dependent measure in our experiment, and to formulate hypotheses on processing of attitude reports in terms of modulations of the amplitude of the N400 component (Kutas & Federmeier, 2011; Kutas & Hillyard, 1980). The N400 is sensitive to the predictability and the plausibility of words in context (Nieuwland et al., 2019; for a supporting theory, see; Baggio & Hagoort, 2011; Baggio, 2018). Here, we address the question whether the N400 is modulated by the plausibility of sentences or of complement clauses. Propositional attitude reports allow us to tease apart these two scenarios. Consider plausible (a) and implausible clauses (b), embedded under believe (1) and doubt (2): (1) a. Magnus believes that mosquitos live from blood. b. Magnus believes that mosquitos live from vodka. (2) a. Magnus doubts that mosquitos live from blood. b. Magnus doubts that mosquitos live from vodka. If the N400 amplitude is modulated by the plausibility of complement clauses, then we should expect a larger N400 to ‘vodka’ than to ‘blood’ in both (1) and (2). This prediction also follows from the view that the N400 reflects semantic relations stored in declarative memory, if we add the crucial proviso that these relations should not be modulated (e.g., upon activation, retrieval etc.) by the wider sentence context: because ‘vodka’, unlike ‘blood’, does not belong to the semantic network activated by ‘mosquito’, a larger N400 should be observed in response to ‘vodka’ in (b), in both (1) and (2). If, instead, the N400 is sensitive to the plausibility of sentences—beyond clausal boundaries—, then we should expect this plausibility to contribute to a larger N400 to ‘vodka’ in comparison to ‘blood’ in (1), and a reversal of the effect in (2), such that ‘blood’ would elicit a larger N400 than ‘vodka’: that is because, in general, it is more plausible (i.e., more likely to be true) that someone doubts an implausible proposition than a plausible one; therefore, a larger N400 is expected at the word that makes the embedding sentence implausible under doubt, i.e., ‘blood’. In what follows, we first provide the minimal linguistic background for our ERP experiment (sections 1.1-1.2). Next, we present the hypotheses tested in our experiment (section 1.3). Given the absence of experiments on attitude reports, below we focus largely on linguistic theory. Connections between our study and previous research on semantic processing and language-related ERP components will be considered in the Discussion (section 4). 1.1. Linguistic background I: Declerck's factuality scale How can we understand the intuitive symmetry between believe and doubt in the broader framework of a theory of attitude verbs? Declerck (2011) suggests a way to approach this problem. Given some proposition p, a modalizer (e.g., a modal auxiliary, a conditional clause, an intensional verb etc.), when combined with p, assigns a certain factuality value to p. Factuality values may be ordered in a scale, ranging from factual (‘It is the case that p’, ‘It is not the case that p’, ‘X knows that p’ etc.) to counterfactual (‘It should not have been the case that p’, ‘If p had not been the case, then q would have been the case’, ‘X fantasizes that p’ etc.), with various values in between (e.g., probability, possibility, neutrality, not-yet-factuality, improbability, impossibility etc.). Declerck then shows that ‘world-evoking verbs’, which also include propositional attitude verbs, may be arranged in a similar factuality scale. Factive verbs (e.g., know) evoke a world which is “automatically interpreted as being the factual world” (p. 41), while attitude verbs (e.g., believe, doubt, dream, imagine etc.) create an “intensional world which may or may not coincide with the factual world ( …) [and] which is related to the factual world in terms of a factuality value”. Declerck argues that believe has the factuality value ‘possibly factual’, and some uses of imagine “can imply counterfactuality, e.g., in ‘He imagined he was Napoleon’“. The semantics of an intensional verb would therefore specify the “likelihood of compatibility” between the complement clause (the proposition it

2

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

Table 1 Experimental stimuli in the original Bokmål Norwegian and English translations. The form of stimulus sentences under each intensional verb is also shown (left column) along with plausibility values for the embedded propositions (e.g., ‘mosquitos live off blood/vodka’). Intensional verb

Plausibility of p

Example

+

(a) ‘X knows that p’ (vite)

Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus Magnus

Plausible: p

Implausible: p



Plausible: p+

(b) ‘X believes that p’ (tro)

Implausible: p− Plausible: p+

(c) ‘X dreams that p’ (drømme)

Implausible: p− Plausible: p+

(d) ‘X doubts that p’ (tvile)

Implausible: p− Plausible: p+

(e) ‘X imagines that p’ (innbille seg)

Implausible: p−

vet at mygg lever av blod. knows that mosquitos live off blood. vet at mygg lever av vodka. knows that mosquitos live off vodka. tror at mygg lever av blod. believes that mosquitos live off blood. tror at mygg lever av vodka. believes that mosquitos live off vodka. drømmer at mygg lever av blod. dreams that mosquitos live off blood. drømmer at mygg lever av vodka. dreams that mosquitos live off vodka. tviler på at mygg lever av blod. doubts that mosquitos live off blood. tviler på at mygg lever av vodka. doubts that mosquitos live off vodka. innbiller seg at mygg lever av blod. imagines that mosquitos live off blood. innbiller seg at mygg lever av vodka. imagines that mosquitos live off vodka.

expresses) and the factual world. For the present purposes, we focus on five verbs—the factive know plus four modalizers, which may be arranged in a partial factuality scale: (3) know > believe > dream > doubt > imagine The non-modal factive know and the “normally counterfactual” (Declerck) uses of imagine are the poles of the scale; the symmetric believe and doubt are two intermediate points on each side of the scale; and dream is the middle point, as it can occur with both plausible and implausible propositions. One may use the factuality scale in (3) to specify plausibility conditions for attitude reports involving these five verbs, as a function of the plausibility of complement clauses: the plausibility of an attitude report is seen to depend on the plausibility of the embedded clause and on the specific intensional verb that is used, as explained below (see also Tables 1-2). Note that the semantics of attitude reports does not reduce to plausibility conditions. These formalize just one aspect of meaning which, given the relations between plausibility and the N400, is likely to affect on-line processing. Other aspects of the semantics of attitude reports are captured by specifications of truth conditions (see Heim & Kratzer, 1998), assertibility conditions (Gauker, 2003), and other theoretical (e.g., lexical) constructs. Here, we define plausibility as likelihood of truth in the actual world, applying to linguistic expressions of propositions (e.g., sentences or clauses). This notion encompasses psycholinguistic accounts of plausibility as the degree of semantic (or thematic) fit between a verbal predicate and its arguments, given the sense of the verb and knowledge of the world (Gibson & Pearlmutter, 1998; Padó et al., 2009). For example, the complement clauses in (1a) and (2a), ‘Mosquitos live off blood’, are plausible (we indicate this as p+), while those in (1b) and (2b), ‘Mosquitos live off vodka’, are implausible (p−). In general, plausibility may be viewed as a continuous variable. However, we are interested in clear cases of plausible and implausible complement clauses—at least relatively, i.e., in comparison to one another. Given this simplification, one could specify the plausibility of an Table 2 Summary of theory and hypotheses concerning modulations of N400 amplitudes. A.

Sentence types

Plausibility conditions

(a) Vite (b) Tro (c) Drømme (d) Tvile (e) Innbille seg

S: S: S: S: S:

S+ S+ S+ S+ S+

B.

Sentence types

Hypothesis 1 (sentential)

Hypothesis H2 (clausal)

(a) Vite (b) Tro (c) Drømme (d) Tvile (e) Innbille seg

S: S: S: S: S:

N400(p−) > N400(p+) N400(p−) > N400(p+) N400(p−)~N400(p+) N400(p−) < N400(p+) N400(p−) < N400(p+)

N400(p−) > N400(p+) N400(p−) > N400(p+) N400(p−) > N400(p+) N400(p−) > N400(p+) N400(p−) > N400(p+)

‘X ‘X ‘X ‘X ‘X

‘X ‘X ‘X ‘X ‘X

knows that p’ believes that p’ dreams that p’ doubts that p’ imagines that p’

knows that p’ believes that p’ dreams that p’ doubts that p’ imagines that p’

3

for for for for for

p+ p+ p+/− p− p−

p+: p+: — p−: p−:

e.g., ‘mosquitos live off blood’ e.g., ‘mosquitos live off blood’ e.g., ‘mosquitos live off vodka’ e.g., ‘mosquitos live off vodka’

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

attitude report (S+ or S−) as a function of the plausibility of the complement clause (p+ or p−). Verbs on the left-hand side of the factuality scale (3) present a direct dependency between plausibility values for S and p. In the absence of specific knowledge about X, and about the truth of S and p in the actual world, the reports ‘X knows that p’ and ‘X believes that p’ are plausible (i.e., likely to be true), if p is plausible (S+ for p+). Instead, verbs on the right-hand side of the factuality scale (3) present an inverse dependency. ‘X doubts that p’ and ‘X imagines that p’ (for counterfactual uses of imagine) are plausible, if p is implausible (S+ for p−). Finally, ‘X dreams that p’ is plausible for a range of plausibility values of p. To distinguish between the factive know and the epistemic modalizer believe one needs some additional specifications, e.g., of truth conditions (i.e., ‘X knows that p’ is true iff p, which does not apply to believe) or presuppositions (‘X knows that p’ presupposes p, which is not so for believe), and likewise for distinguishing counterfactual uses of imagine and typical uses of the epistemic modalizer doubt. The plausibility conditions just stated capture an important aspect of the semantics of attitude reports, which may affect how comprehenders perceive these sentences on-line. 1.2. Linguistic background II: Norwegian intensional verbs Declerck's factuality scale and its consequences for the plausibility conditions of attitude reports hold cross-linguistically. The present study was conducted in Norwegian (Bokmål), which has attitude verbs of knowing (vite), believing (tro), dreaming (drømme), doubting (tvile), and, importantly, a reflexive verb of imagination (innbille seg) which may also be used counterfactually to refer to incorrect opinions or judgments, illusions or delusions etc. The Norwegian on-line dictionaries Bokmålsordboka/Nynorskordboka contain a single entry for innbille seg, giving its (primary) counterfactual meaning only: i.e., “to have a false idea” about something, or to “believe (something which is not actually the case)”. The NAOB (Det Norske Akademis ordbok) gives both senses of the verb: the counterfactual and the possibly factual (i.e., to presume or imagine). To our knowledge, there is no empirical research on the semantics of attitude verbs in Norwegian, including innbille seg. One way to assess these dictionary entries is by using corpus data. Here, we used the NoWaC corpus (Norwegian Web as Corpus; Guevara, 2010). First, given the morphosyntactic structure of the sentences used in our study (see above and Table 1), we first searched for only third-person forms (lemma innbille + seg). This gave 989 hits. A random sample of 100 hits contained only 3 possibly factual uses, with the remaining 98 hits being quite clearly counterfactual, taking into account a context of 30 words preceding and following each intensional verb. Second, to test whether this holds across all reflexive uses of innbille, we searched for combinations of innbille with reflexive pronouns (i.e., meg, deg, seg, oss, and dere), excluding all transitive occurrences of innbille (to make someone believe something which is not grounded in reality, according to NAOB). This search yielded 2727 hits, from which 3 transitive-use tokens were manually excluded. A random sample of 100 hits contained only 17 instances in which uses of innbille + a reflexive pronoun were possibly factual. Interestingly, of those 17, 15 were first person singular forms, on a total of 37 first person singular occurrences in the sample. In sum, the majority of occurrences of the reflexive innbille (seg) in the NoWaC corpus are counterfactual, and most other cases are in first person singular. 1.3. The N400 and plausibility: alternative processing hypotheses We constructed sentences in Norwegian Bokmål, using the five attitude verbs (vite, tro, drømme, tvile, innbille seg) from the factuality scale (3), with clausal complements expressing plausible or implausible propositions (Table 1). This 5 × 2 design allowed us to test two alternative hypotheses: the amplitude of the N400 component evoked by critical nouns is sensitive either to the plausibility of the embedding sentence (hypothesis 1; or sentential hypothesis) or to that of the embedded complement clause (hypothesis 2; or clausal hypothesis). 1.3.1. Hypothesis 1: the N400 is sensitive to the plausibility of S If the amplitude of the N400 component in ERPs is sensitive to the plausibility of the matrix sentence (S), it should reflect the plausibility conditions specified above (Table 2A): the N400 component triggered by the noun that renders the entire sentence implausible should be larger than the N400 component evoked by the noun that renders the sentence plausible. For example, under the verbs vite (know) and tro (believe), the noun vodka should produce a larger N400 component than blod (blood; see Table 1); the N400 effect should be abolished under drømme (dream); and it should be reversed for tvile (doubt) and innbille seg (imagine), with a larger N400 component in response to blod than to vodka (see Table 2B). These predictions are in line with current knowledge of factors modulating the N400 amplitude, including semantic relatedness, or coherence, between the eliciting word and the given supra-clausal context, like a sentence or a discourse (Van Berkum, Hagoort, & Brown, 1999; 2003; Nieuwland & Van Berkum, 2006; Hald, Steenbeek-Planting, & Hagoort, 2007; Otten & Van Berkum, 2007; Baggio, Van Lambalgen, & Hagoort, 2008; Baggio, 2018). 1.3.2. Hypothesis 2. the N400 is sensitive to the plausibility of p The alternative hypothesis is that the N400 is sensitive only to the plausibility of the complement clause (p), whereas relations between the plausibility of S and p would be reflected in modulations of other ERPs, for example post-N400 effects (DeLong, Quante, & Kutas, 2014; Kuperberg & Wlotko, 2018), if they are manifested in ERPs at all. Under this hypothesis, the prediction is uniform for the different intensional verbs: the N400 component triggered by the noun that renders the clause implausible is larger than the N400 elicited by the noun that makes the clause plausible, regardless of the embedding context. We should therefore see a larger N400 for vodka than for blod under all intensional verbs (Table 2B). 4

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

2. Methods 2.1. Experimental design In order to test whether the amplitude of the N400 component is sensitive to the plausibility of the embedding sentences (Hypothesis 1) or the complement clauses (Hypothesis 2), we used a two-factor 5 × 2 design, with the factors Verb (sentence type; with 5 levels for the 5 Norwegian verbs vite, tro, drømme, tvile, and innbille seg) and Plausibility of the complement clause (2 levels: plausible or implausible; for details, see 2.3 below). The dependent variables of interest were the mean amplitudes of the N400 and post-N400 components, over trials and within participants. We used an event-related mixed-blocks design, where sentences derived from any of the 5 × 2 combinations of the factors Plausibility and Verb could appear in any block, in any order. The task for participants was to read silently each stimulus sentence for comprehension. No overt response was required. We expected that, if the amplitude of the N400 varies with the (im)plausibility of sentences (Hypothesis 1), the factors Verb and Plausibility would interact, as described in Table 2B. If instead the amplitude of the N400 varies only with the (im)plausibility of the complement clause (Hypothesis 2), but is not sensitive to the sentence's (matrix clause) plausibility, a main effect of Plausibility is expected, but no interaction with Verb (Table 2B). 2.2. Stimuli We constructed 300 Norwegian (Bokmål) sentence pairs using attitude verbs for knowing (vite), believing (tro), dreaming (drømme), doubting (tvile), and counterfactual imagining (innbille seg). Sixty sentence pairs for each attitude verb were constructed. Each pair of sentences comprised an implausible and a plausible versions, differing only with respect to the critical noun (e.g., a direct object noun) in the complement clause (Table 1; see Supplementary Material). In all cases, plausible sentences expressed common or stereotypical facts (e.g., cats chase mice), while implausible sentences contradicted those known facts (e.g., cats chase doctors). Prior to conducting systematic semantic plausibility and cloze probability tests (2.3 and 2.4), we had 5 different informants (native speakers of Norwegian, readers of Bokmål, and naive with respect to the aims of the experiment) independently assess all our stimuli for grammaticality and typicality: they confirmed that all items are grammatical, and that sentences in the plausible condition are common knowledge to Norwegians; these informal judgments, and the experimenters’ own intuitions, are moreover confirmed by the plausibility and cloze probability data presented below. In all cases, the grammatical subject of the main sentence was a proper name of person (e.g., ‘Magnus’), and the subject of the complement clause was a common noun (e.g., ‘mosquitos’). In 90% of the sentences, the critical nouns were direct object nouns. To introduce some structural variation, in 10% of all sentences the critical nouns were part of a modifier phrase (such as a PP, e.g., as in ‘X {knows/...} that boxers fight in the {ring/cockpit}‘). Neither the proper names in the subject position in the matrix sentence (‘X’), nor the complement clauses (‘that p’), nor critical nouns, were ever repeated across multiple items. Phonological and orthographic similarities between each critical word and the word immediately preceding it, as well as between critical words in each pair, were avoided. Critical nouns did not exceed 12 characters in length. The mean length of the plausible nouns was 6.5 letters (SD = 2.1), and the mean length of the implausible nouns was 6.8 letters (SD = 2.1). Critical nouns were matched in mean lemma frequency in the stimulus set across conditions. Frequency was assessed by using the NoWaC corpus for Bokmål Norwegian (Guevara, 2010). To correct misclassifications of lemmata in the NoWaC search results, lemma frequencies were adjusted based on data from random samples of 100 results for each lemma. The data were manually checked and corrected by excluding homonymic forms and other misclassified occurrences. After this adjustment, lemma frequencies across the stimulus set were matched on average: 31129.18 pmw (SD = 63147.94) for plausible nouns; 27079.39 (SD = 58594.97) for implausible nouns (W = 43633, p = 0.5198). Some variation in sentence length was introduced via continuations of 2–6 words after the critical noun in 20% of the sentences, and in the remaining 80% the critical noun was the sentence-ending word. These continuations were neutral with respect to the (im)plausibility of the complement clauses. We omitted full stops at the end of each sentence to avoid eliciting possible sentence-final and wrap-up ERP effects (Stowe et al., 2018). We created 10 randomized item lists. Each of the 300 sentence pairs appeared only in one condition per list (either plausible or implausible, under a single propositional attitude verb), and each participant read 30 sentences in each of the 10 (5 × 2) combinations of attitude verbs (5) and plausible or implausible (2) embedded clauses. In addition, 250 fillers with varying form and content were included in the set. 2.3. Semantic plausibility To assess preferences for semantically plausible or implausible clauses under different attitude verbs, we conducted a two-alternative forced-choice (2AFC) sentence completion experiment on the on-line platform Ibex Farm. We drew a random sample of 50 sentence pairs from the final set of items to be used in the ERP study. This resulted in 5 lists, each containing all 50 embedded clauses under different attitude verbs, with 10 sentences per verb. The sentences were presented as truncated before the critical noun, with both the implausible and plausible critical nouns following the sentence: ‘Magnus {vet/tror/...} at mygg lever av’, followed by ‘{blod/ vodka}‘. The order of appearance (top vs bottom) of plausible and implausible nouns on the screen was randomized over trials. Thirty-seven L1 speakers of Norwegian, whose primary written language was Bokmål (mean age 24.27, age range 19–34), were instructed to choose among the two critical nouns the one that would best complete the sentence. For all items, the attitude verb was presented in bold typeface: the instructions stated that participants should take the attitude verb into account when answering. Proportions of plausible responses (reported in Table 3; for, e.g., blod in the examples above) are highest with know (vite), and 5

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

Table 3 Summary of results of the semantic plausibility test and of the cloze probability tests. The values shown are mean proportions or probabilities for the plausible nouns (e.g., ‘blood’).

(a) Vite (b) Tro (c) Drømme (d) Tvile (e) Innbille seg

Semantic plausibility

Cloze probability

Mean (SD)

Mean (SD)

0.986 0.897 0.451 0.308 0.619

0.353 (0.276) 0.278 (0.233) 0.232 (0.212) 0.194 (0.193) 0.27 (0.24)

(0.116) (0.304) (0.499) (0.462) (0.486)

then decrease gradually under believe (tro), dream (drømme), and doubt (tvile): this pattern is entirely consistent with Declerck's factuality scale. The proportion of plausible nouns chosen for imagine (innbille seg) is higher than we had originally expected (see Table 3), given Declerck's scale. However, it is compatible with the notion that innbille seg admits factive and counterfactual meanings, if not exactly with the idea that the latter meaning is the primary or the default one, as suggested by the Bokmålsordboka entry (i.e., “ha en falsk forestilling om”, or ‘to have a false idea about something’). Crucially, our data present a dichotomous distribution. Within participants, completions for inbille seg are consistent with only one of the two meanings: out of 37 participants, 9 exhibit a pattern compatible with the counterfactual meaning (i.e., the proportion of plausible nouns selected is low, between 0 and 0.1), whereas 12 choose completions compatible with the factual meaning (i.e., the proportion of plausible nouns is very high, between 0.9 and 1). It should be noted that participant responses in an off-line sentence completion task (or any other production tasks) may not be indicative of how participants would understand or process the same sentences on-line: off-line tasks may engage slower or more deliberate processes that are not necessarily recruited in on-line processing. Nonetheless, these results invite some caution, in particular in relation to on-line EEG/ERP data: it cannot simply be assumed that participants compute counterfactual models for each and every sentence under innbille seg, at least not in the same sense in which one would assume that factive models are computed for all plausible vite (know) sentences. 2.4. Cloze probability In addition to the sentence completion test, we conducted a cloze probability experiment on the on-line platform Ibex Farm. Each participant (N = 472) saw a random sample of 75 experimental sentences, truncated immediately before the critical noun. Participants were asked to complete each sentence with the word that first comes to mind as a probable continuation. The test in this form took 10–15 min to complete. We collected 15 responses per participant per condition, and between 12 and 37 responses per item. The distribution of the cloze probability data reveals the expected tendencies, given the meanings of attitude verbs, and mirrors the results of the sentence completion test. The cloze probabilities of plausible nouns (blod) are highest with know (vite), and lower under believe (tro), dream (drømme), doubt (tvile), and imagine (innbille seg) in agreement with Declerck's factuality scale (Table 3). Implausible nouns (vodka) had cloze probability close to 0 with all verbs. These results suffice to rule out the possibility that, if the pattern predicted by Hypothesis 1 is found (Table 2B), the N400 is driven by the cloze probability of critical nouns or by semantic preferences for specific nouns (as tested in the sentence completion study). Hypothesis 2 would likewise be more consistent with a rather different pattern of plausibility and cloze probability data, where those measures show uniform values across the different propositional attitude sentences. 2.5. Participants We analyzed ERP data from 23 participants (10 female, mean age 26 years, age range 19–36; mean education 20 years), who met our inclusion criteria, with a minimum of 20 out of 30 artifact-free trials (see 2.8 for additional details) per condition on average. Eleven additional participants were excluded due to few artifact-free trials on average (between 3 and 19, out of 30 per condition), and one was discarded for failing to attend to the stimulus sentences. Participants in the EEG experiment did not take part in the sentence completion test or in the cloze probability test (the three samples were disjoint). Participants were native L1 speakers and readers of Bokmål Norwegian, and none of them had acquired any additional languages before the age of 5. All participants were right-handed, with no history of neurological disorders or trauma, and all had normal or corrected-to-normal visual acuity. Participants gave their informed consent in writing before taking part in the study. The research was approved by the Norwegian Center of Research Data (NSD; project code 56768). All data collected were handled according to NSD recommendations. 2.6. Procedure Participants were seated in front of a screen (viewing distance ~90–110 cm) in a sound attenuated and electrically shielded EEG booth. The sentences were presented on the screen word-by-word (600 ms SOA, 300 ms word duration; white on black background). In between sentences, a fixation cross appeared in the middle of the screen (duration 2000 ms). Participants were instructed to read each sentence carefully for comprehension, and to avoid eye blinks or other movements, except when the fixation cross was 6

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

presented. Participants were informed that they would be asked a few questions about the sentences at the end of each session. This was implemented as an additional incentive for participants to remain attentive and focused on the sentence content, but was followed up only informally at the end of the session (data were not collected). Each session was divided into blocks of 15 sentences each (experimental and fillers). Upon completion of a block, the experiment was paused. The duration of each break was determined freely by participants. One EEG session lasted between 1:15 and 1:30 h on average, including breaks. 2.7. Data acquisition EEG signals were recorded from 32 active electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, TP9, CP5, CP1, CP2, CP6, TP10, P7, P3, Pz, P4, P8, PO9, O1, Oz, O2, and PO10), using the actiCAP system by Brain Products GmbH. The implicit reference was FCz: the default choice in actiCAP systems. A linked mastoid reference was computed off-line, by re-referencing EEG channels to the average of TP9 and TP10. Electrode impedance was kept below 5 kOhm throughout the experiment. The EEG was sampled at 1000 Hz, using a 1000 Hz high cutoff filter, and a 10 s time constant (low cutoff). 2.8. Data analysis EEG data were analyzed using FieldTrip in MATLAB (Oostenveld, Fries, Maris, & Schoffelen, 2011). The pre-processing pipeline was as follows: (i) re-referencing (as in 2.7); (ii) segments corresponding to critical nouns were extracted from the EEG trace, with an interval of 200 ms preceding and 800 ms following word onset; (iii) baseline correction using pre-stimulus [-200, 0] data; (iv) artifact rejection in the extracted segments was based on two FieldTrip functions: firstly, all trials where amplitude values exceeded a threshold of ± 150 μV, relative to baseline, were rejected; secondly, all trials containing eye movements were rejected by means of thresholding the z-transformed value of the preprocessed raw data from channels Fp1 and Fp2, in the 1–15 Hz range. Following artifact rejection, clean EEG segments were filtered with a digital low-pass filter at 30 Hz. ERPs were obtained for each subject by averaging over trials in each condition. Statistical analyses used an omnibus ANOVA model, with factors Verb (5 levels), Plausibility (2 levels), and Electrode (32 levels), and mean voltage values, calculated per participant across trials, in the 300–500 ms (N400) and 500–800 ms (post-N400 effects) time windows relative to the onset of critical nouns, as dependent measures. The omnibus ANOVA allows us to test for the presence or absence of a main effect of Plausibility and of a Verb × Plausibility interaction (Hypothesis 1 vs Hypothesis 2). However, testing of Hypothesis 1, as a particular pattern of N400 effects (Table 2B), requires a more fine-grained statistical approach. We therefore used, in addition, five ANOVA models, with factors Plausibility and Electrode, and mean voltage values in each condition, in the 300–500 ms and 500–800 ms time windows, as dependent variables, in each of the five propositional attitude report types, separately. The data were also subjected to non-parametric cluster-based statistics (Maris, 2004; Maris & Oostenveld, 2007), using mean voltage values for plausible and implausible critical nouns, in the 300–500 ms and 500–800 ms time frames, in each of the five report types, separately. The output is a set (possibly empty, if no effects are found) of spatial clusters (adjacent channels) in which the two conditions (implausible vs plausible) differ in the chosen time windows, giving the sum of t-statistics in the cluster (Tsum), Monte Carlo estimates of p-values, and cluster size (S) in number of channels. 3. Results Event-related potentials from all conditions (Figs. 1-2) show characteristic sequences of N100, P200, N400, and later (positive or negative) components, typical of ERP responses time locked to visually presented words. Inspection of ERP waveforms reveals an increase of the N400's amplitude for implausible relative to plausible critical nouns across sentence types, as well as differences in the amplitude of later ERP components (Fig. 1). We describe these N400 and post-N400 effects below. 3.1. N400 Implausible critical nouns trigger a larger N400 component than do plausible nouns (Figs. 1-2), apparently similarly across sentence types. The omnibus ANOVA indeed reveals a main effect of Plausibility in the N400 time frame, and no Verb × Plausibility interaction (Table 4). A separate analysis of the effects of implausible nouns under each intensional verb suggests that the amplitude of the N400 component is increased for know, believe, dream, and doubt, but not for imagine, where the effect is abolished. ANOVA statistics for each sentence type separately reveal main effects of Plausibility in the N400 time interval, for know, believe, dream, and doubt, but not for imagine (Table 5). Cluster-based statistics support this result. A single negative cluster, and no positive clusters, is found for implausible vs plausible nouns with know (Tsum = −46.41, p = 0.001, S = 16), with believe (Tsum = −50.14, p = 0.001, S = 17), dream (Tsum = −31.89, p = 0.006, S = 12), and doubt (Tsum = −31.89, p < 0.001, S = 21) in the 300–500 ms time frame, whereas no clusters are found for imagine. All negative clusters involve central and posterior sites (channels C3, Cz, C4, CP1, CP6, P3, P4, O2 are included in all four clusters), as consistent with the distribution of the N400 effect (Fig. 2). 3.2. Post-N400 components In some sentence types, implausible critical nouns modulate the amplitude of a late ERP component, immediately following the N400. The omnibus ANOVA does not reveal any effects of Plausibility, or Verb × Plausibility interactions, in the 500–800 ms interval 7

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

Fig. 1. Grand average (N = 23) ERP waveforms from 3 recording sites (Pz, Cz, F4; columns), time locked to the onset (0 ms) of plausible vs implausible critical nouns under know (vite), believe (tro), dream (drømme), doubt (tvile), and imagine (innbille seg; rows). Asterisk marks (*) indicate statistical effects at 300–500 ms (N400) or at 500–800 ms (late components).

(Table 4). ANOVA statistics for each sentence type, in the 500–800 ms window, show clearer Plausibility × Electrode interactions with doubt and imagine (Figs. 1-2; Table 5). This suggests that the amplitude of a post-N400 wave may be modulated differently across channels by implausible nouns under the intensional verbs doubt and imagine. Cluster-based statistics produce no negative or positive clusters in the 500–800 ms window for know, believe, or dream. The only clusters identified were for implausible vs plausible nouns under doubt (Tsum = −11.57, p = 0.018, S = 4) and imagine only (Tsum = −3.97, p = 0.15, S = 2). These clusters have negative Tsum sign and include only anterior electrodes (i.e., Fp1, Fp2, F4 and F8 for doubt; Fp1 and Fp2 for imagine).

8

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

Fig. 2. Grand-average (N = 23) topographies showing mean amplitude differences between ERPs evoked by the critical nouns in implausible vs plausible clauses, in the specified windows, relative to the onset of nouns at 0 ms. Asterisk marks (*) indicate statistical effects at 300–500 ms (N400) or at 500–800 ms (late components).

Table 4 Results of omnibus ANOVA statistics.

Verb Plausibility Electrode Verb × Plausibility Verb × Electrode Plausibility × Electrode Verb × Cond. × Electr.

300–500 ms

500–800 ms

F(4,88) = 0.766, p = 0.55 F(1,22) = 20.57, p < 0.001 F(32,704) = 8.318, p < 0.001 F(4,88) = 1.534, p = 0.199 F(128,2816) = 1.193, p = 0.0725 F(32,704) = 4.979, p < 0.001 F(128,2816) = 1.178, p = 0.0885

F(4,88) = 0.631, p = 0.642 F(1,22) = 0.961, p = 0.338 F(32,704) = 19.51, p < 0.001 F(4,88) = 1.412, p = 0.236 F(128,2816) = 1.138, p = 0.142 F(32,704) = 5.378, p < 0.001 F(128,2816) = 1.152, p = 0.122

9

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

Table 5 Results of ANOVA statistics for each sentence type.

(a) ‘X knows that p’ (vite) Plausibility Electrode Plausibility × Electrode (b) ‘X believes that p’ (tro) Plausibility Electrode Plausibility × Electrode (c) ‘X dreams that p’ (drømme) Plausibility Electrode Plausibility × Electrode (d) ‘X doubts that p’ (tvile) Plausibility Electrode Plausibility × Electrode (e) ‘X imagines that p’ (innbille seg) Plausibility Electrode Plausibility × Electrode

300–500 ms

500–800 ms

F(1,22) = 9.64, p = 0.005 F(32,704) = 3.488, p < 0.001 F(32,704) = 1.408, p = 0.069

F(1,22) = 0.143, p = 0.709 F(32,704) = 9.523, p < 0.001 F(32,704) = 1.438, p = 0.057

F(1,22) = 7.531, p = 0.012 F(32,704) = 6.656, p < 0.001 F(32,704) = 3.543, p < 0.001

F(1,22) = 2.902, p = 0.103 F(32,704) = 6.179, p < 0.001 F(32,704) = 0.887, p = 0.649

F(1,22) = 7.464, p = 0.012 F(32,704) = 8.625, p < 0.001 F(32,704) = 1.123, p = 0.295

F(1,22) = 0.009, p = 0.925 F(32,704) = 4.391, p < 0.001 F(32,704) = 1.569, p = 0.025

F(1,22) = 20.17, p < 0.001 F(32,704) = 5.88, p < 0.001 F(32,704) = 2.074, p < 0.001

F(1,22) = 1.692, p = 0.207 F(32,704) = 11.32, p < 0.001 F(32,704) = 2.493, p < 0.001

F(1,22) = 0.296, p = 0.592 F(32,704) = 3.021, p < 0.001 F(32,704) = 1.691, p = 0.011

F(1,22) = 0.54, p = 0.47 F(32,704) = 7.117, p < 0.001 F(32,704) = 3.1, p < 0.001

4. Discussion We set out to test whether the amplitude of the N400 component is sensitive to the plausibility of sentences (or matrix clauses; Hypothesis 1) or to that of complement clauses (Hypothesis 2), in the context of propositional attitudes. Our ERP results do not support Hypothesis 1, and are largely consistent with Hypothesis 2. In addition, while we did not formulate any specific hypotheses concerning modulations of the amplitude of post-N400 components in ERPs, we found indications of increased amplitudes of late components in response to implausible nouns under doubt and imagine. Therefore, only attitude verbs on the lower end, or right-hand side, of Declerck's factuality scale give rise to modulations of post-N400 components. Below, we discuss these two findings in light of our theoretical premises and of other results in the literature. 4.1. The N400: plausibility, predictability, and pragmatics Our results do not directly contradict previously published research. However, earlier ERP experiments on logical form, event structure, and discourse (for a review, see Baggio, 2018) would seem to lend greater support to Hypothesis 1 than 2 (Van Berkum et al., 1999; 2003; Nieuwland & Van Berkum, 2006; Hald et al., 2007; Otten & Van Berkum, 2007; Baggio et al., 2008; Urbach & Kutas, 2010; Urbach et al., 2015; Kulakova & Nieuwland, 2016). Our ERP experiment was not designed to address the old-standing issue of the functional interpretation of the N400, i.e., whether it reflects lexical semantic access (retrieval, activation) vs integration vs probabilistic prediction (for discussions, see Baggio & Hagoort, 2011; Kutas & Federmeier, 2011; Nieuwland et al., 2018; Baggio, 2018). These views seem largely orthogonal to the hypothesis tested here: lexical semantic access, prediction, preactivation, integration etc. may all be influenced by information available either within a clause or across clausal boundaries. The finding that supraclausal information, provided by attitude verbs, does not modulate N400 amplitudes may then be relevant for all these different accounts of the N400. If the N400 reflects processing (e.g., access, prediction, integration etc.) of lexical meanings into the broadest (non-)linguistic context available, then it should be modulated by the intensional context in which a complement clause is embedded. The fact that in our study this was not the case, and the apparent conflict with earlier work that ensues, requires an explanation. One possibility is that sentence meaning composition simply did not occur in our experiment. Participants were asked to read each sentence carefully, and to be prepared to answer a few questions at the end of the EEG session. Apart from this, we had no additional safeguards against shallow processing. In this context, however, we do not find this argument persuasive. The inclusion of a task often presents a dilemma: the presence or absence of effects could easily be attributed to the presence or absence of an overt task, respectively, unless the same experiment is conducted with and without a task, within or between participants. Studies that adopted this approach found N400 effects independent of the presence or absence of a task (e.g., see Baggio, 2012), which only affected the latency of the N400 onset and related temporal properties. Additionally, in Nieuwland and Van Berkum (2006), for example, no overt task was imposed. The authors reported that the N400 component is sensitive to discourse relations, beyond the local phrasal or clausal context. The absence of a task per se would not explain why, in our study, the N400 was, instead, responsive to the clausal context only. The presence of clear N400 effects, largely as entailed by Hypothesis 2, and of later ERP effects is indirect evidence that processing occurred. A second possibility is that what drives N400 amplitudes in our study is not plausibility as such, but rather the relative predictability or the expectancy of critical nouns in context. However, this view would require a parallel set of hypotheses to 1 and 2, 10

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

since semantic predictability, just like plausibility, may be determined either at the clausal or at the sentential level. With verbs on the left-hand side of Declerck's factuality scale (i.e., under know and believe), those predictability values should coincide: blood is more predictable than vodka in the context of both clause and sentence. Under verbs on the right-hand side of the scale, instead, those two values should come apart. Our ERP findings seem compatible with the view that what drives predictability (if that is the driving factor) are semantic relations in the complement clause, and not in the broader sentence context, else a reversal of the N400 for doubt and imagine should be found. Our sentence completion test shows that participants tend to select the nouns that fit best the (semantic) requirements of sentences. Our data indicate that cloze probabilities of plausible nouns decrease as one moves rightward in Declerck's factuality scale. This licenses the expectation that on-line measures of semantic processing, whether based on predictability or plausibility, exhibit the same pattern. In this study, this was not the case for the N400 component. Therefore, one should frame four different hypotheses concerning the clausal vs supra-clausal (sentence- or discourse-level) N400 effects of predictability or plausibility, and one should test them with both off-line and on-line data. Our N400 data are consistent with clausal effects of either factor. Alternatively, one may view both plausibility (internal estimates of the likelihood of the truth of propositions) and predictability (internal estimates of the likelihood of specific words occurring in a given context) as determined by the strength of (semantic) relations between stored lexical representations. In general, and as a default, the strength of semantic relations between words appearing in close proximity in a sequence of stimuli (a clause, a sentence etc.) leads to a proportional reduction of the amplitude of the N400 component at the eliciting word (Baggio & Hagoort, 2011; Kutas & Federmeier, 2000, 2011). In examples (1)–(2) above, blood is more strongly related to mosquitos than is vodka, hence the smaller N400 elicited by blood compared to vodka. This also makes blood a more predictable continuation than vodka, and sentences with blood as a continuation more plausible than sentences continuing with vodka. The strength of semantic relations between lexical representations (i.e., blood or vodka relative to mosquitos) determines both system-internal probabilities of specific words occurring, given a sentence fragment (blood > vodka), and the system-internal probabilities of specific sentences or propositions being held true (‘Mosquitos live off blood’ > ‘vodka’). The longterm structure of semantic memory shapes, in general and as a default, the ‘subjective’ probabilities that the system assigns to the occurrence of specific words (predictability) and to the truth of sentences or clauses (plausibility). Multiple additional factors may modulate the N400 amplitude, possibly overriding some of the default effects of semantic relations. Consider for example the ERP study on quantifiers by Urbach and Kutas (2010). They compared sentences of the following types: (4) a. Most farmers grow crops. b. Most farmers grow worms. (5) a. Few farmers grow crops. b. Few farmers grow worms. Off-line ratings showed that (4a) is judged as more plausible than (4b), while (5b) is judged as more plausible than (5a). This crossover interaction pattern is similar to our sentence completion results, where participants completed a believe sentence more often with a noun that renders the complement clause plausible (e.g., blood), whereas they completed a sentence under doubt more often with a noun rendering the complement clause implausible (e.g., vodka). Their ERP results, however, did not show the corresponding interactions: the N400 effect in the worms/crops comparison was reduced under few compared to most, but it did not reverse. This is consistent with our own ERP data: with the reduction, without reversal, of the N400 under the counterfactual innbille (imagine). The key difference between our study and Urbach and Kutas (2010) is that (4)–(5) do not contain an embedded clause. So, in their study, the subject of the sentence (or matrix clause) is inside the scope of the quantifier, whereas in our study, only the complement clause is inside the scope of the modalizer. In both cases, the N400 is driven mainly by stored semantic relations between words in the given ‘local’ linguistic environment (farmers and crops, vs worms; mosquitos and blood, vs vodka) and not by the ‘global’ plausibility of sentences. Operators outside the ‘local’ context (e.g., quantifiers, attitude verbs) may have a moderating effect (e.g., few in Urbach & Kutas, and imagine in our study), but not a direction-changing one. This raises the question under what conditions exactly would the N400 reverse and reflect the plausibility and coherence of sentences or discourse, as opposed to more local semantic relations (and their effects on lexical-semantic predictability, or on the plausibility of propositions encoded in local syntactic environments, e.g., clauses). In operational terms, the question is under what conditions would the N400's amplitude pattern reflect off-line judgments, e.g., semantic selection preferences (our study) or semantic plausibility judgments (Urbach & Kutas, 2010). An inspection of the stimuli used in studies that succeeded in producing such reversal effects (e.g., Nieuwland & Van Berkum, 2006) may suggest that a single token of a logical or a semantic operator (e.g., a quantifier or an intensional verb) does not suffice—a rich discourse context is needed, in which the reversal effect is prepared by multiple occurrences of lexical items, phrasal constructions etc. This would make the N400 sensitive to the amount of relevant information (and to a lesser extent to its proximity to the eliciting word), and not to the presence or absence of lexical or syntactic triggers that, by virtue of their meaning, can reverse plausibility or expectancy patterns. A closing comment on pragmatics. Some of our stimulus sentences, like sentences from other studies, using similar experimental designs, e.g., (4a) by Urbach and Kutas (2010), are apparent violations of (at least) Grice's Maxim of Quantity. The cooperative principle requires that contributions be informative, but to say that ‘X knows that p’, or that ‘X believes that p’, when p is plausible and commonly known, is to fail to provide new information to the hearer. The same could be said about ‘X doubts that p’, which is underinformative when p is implausible or, in fact, commonly known to be false. It is possible that more felicitous uses of attitude reports, where all sentences are equally informative, would lead to different patterns of ERP effects (e.g., see Fischler, Bloom, Childers, Roucos, & Perry, 1983 vs Nieuwland & Kuperberg, 2008 on negation). Hypothesis 1 could eventually be borne out in those conditions. However, pragmatic (in)felicity does not explain the current N400 patterns: pragmatically, there is nothing in common 11

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

between sentences with implausible clauses under different attitude verbs (Table 1), in that no conversational maxim or principle is violated by all of them; so it is not clear what sort of pragmatic account would predict just the observed effects. 4.2. Post-N400 effects: plausibility, presupposition, and discourse We did not formulate explicit predictions concerning modulations of late ERP components, and therefore we cannot provide a principled explanation of the observed effects. Of interest, other studies reported late ERP effects following N400 effects. In several cases, these effects are late positive ERPs, such as the P600 and the late positive complex (LPC). Here, we found negative ERP effects at anterior recording sites, consistent with the temporal profile and the scalp distribution of sustained anterior negativities (SAN). That said, those studies are relevant in the present context, too: the LPC and SAN are both prima facie compatible with source dipoles with anterior negative and posterior positive maxima, i.e., the topographical distributions of the SAN and LPC, respectively. In an ERP study, Federmeier, Wlotko, De Ochoa-Dewald, and Kutas (2007) presented participants with strongly constraining sentences, as in (6), and weakly constraining sentences (7) with either expected endings (a) or unexpected endings (b): (6) a. The children went outside to play. b. The children went outside to look. (7) a. Joy was too frightened to move. b. Joy was too frightened to look. They found an N400 effect modulated by expectancy (unexpected > expected), but not by constraint, and a later frontal positive effect, larger for unexpected words in strongly constraining sentences (6b) relative to unexpected endings in weakly constraining sentences (7b), as well as relative to expected endings in strongly constraining contexts (6a). This is evidence that context effects on word processing unfold over multiple processing stages, which may not all be manifested by the N400 component and effect. The interaction of expectancy and constraint suggests that the late positivity would not reflect processing of unexpected words as such, or the effects of sentence constraint, but rather the need to override, suppress, or recover from a definite prediction of a word or a concept that was not observed. Van Petten and Luka (2012) discuss reports of modulations of the N400, LPCs, or both. They propose that whereas the N400 reflects the benefits of confirmed predictions, some of the late positive effects may indicate the costs of disconfirmed predictions. Our SAN findings do not fit this broad characterization, nor the specific proposal by Federmeier et al. (2007). The SAN in our study may not reflect the costs of recovering from an incorrect prediction. If prediction is clause-based, then we should see a larger SAN after every implausible noun (vodka), but we did not. If prediction is sentence-based, then we should see a larger SAN after implausible nouns (vodka) under know and believe, and after plausible nouns (blood) under doubt and imagine, but again we did not. Furthermore, the SAN may not reflect the conflict in plausibility values for the complement clause as required by the sentence vs as exhibited by the embedded clause: one should again observe an SAN for implausible nouns under know and believe, and for plausible nouns under doubt and imagine. Finally, presupposition failure offers an interesting perspective on post-N400 effects in some studies (e.g., Masia, Canal, Ricci, Vallauri, & Bambini, 2017; Domaneschi, Canal, Masia, Vallauri, & Bambini, 2018; Shetreet, Alexander, Romoli, Chierchia, & Kuperberg, 2019), but does not provide much leverage in this case. Shetreet et al. (2019) compared factive verbs (e.g., know, realize) to non-factive intensional verbs (e.g., believe, think). They found a larger late posterior positive ERP for words that were incompatible with the presupposition conveyed by the complements of the factive verbs, but not with non-factive verbs. In our study, the presupposition triggered by know (that p is the case) fails when p is implausible. If the SAN is an index of violated or failed presuppositions, it should be triggered only in the incongruent condition with know. This was not the case. The fact that a SAN was observed for implausible nouns (vodka) under doubt and imagine raises the possibility that it manifests the costs of integrating implausible meanings into a sentence model, precisely when an implausible proposition is required by relevant sentential constraints. When one is not required, in attitude reports under know, believe, and dream, the information from the implausible clause is not integrated into the sentence model, and only a larger N400 component is observed. This conjecture should not be taken as an explanation of our data, given the lack of prior hypotheses or predictions for post-N400 effects. However, it is consistent with ERP studies reporting SAN effects in response to stimuli requiring additional computations at the level of the sentence or discourse model (Baggio, 2018; Baggio et al., 2008; Bambini, Canal, Resta, & Grimaldi, 2019; Paczynski, Jackendoff, & Kuperberg, 2014; Wittenberg, Paczynski, Wiese, Jackendoff, & Kuperberg, 2014). 5. Conclusion In this study, we investigated processing of Norwegian intensional sentences with plausible or implausible complement clauses. We tested the hypotheses that the amplitude of the N400 is modulated either by (1) the plausibility of the attitude report as a whole (the matrix clause) vs (2) the plausibility of the object of the propositional attitude (the complement clause). We found N400 effects tracking the plausibility of complement clauses, under all intensional verbs except innbille seg (imagine), where the N400 effect was abolished. We also observed modulations of anterior negative waves for implausible clauses with doubt and imagine, i.e., when the plausibility of the sentence requires an implausible complement clause. Our data suggest that the N400 component reflects the processing of semantic relations in ‘local’ linguistic environments (phrases or clauses, or non-syntactically specified local

12

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

environments). These results raise the question under what conditions the N400 becomes sensitive to ‘global’ meaning, such as semantic relations, plausibility, and coherence, at the level of the sentence or discourse. Further research on N400 modulations in embedded vs matrix syntactic structures, as attempted here, and in ‘local’ vs ‘global’ semantic environments, is needed to address this question. Author contributions GB conceived the study. LC, AG, and GB designed the study. LC and AG constructed the stimuli and collected the data. GB, LC, and AG analyzed the data. LC, AG, MV, and GB interpreted the results and wrote the paper. Acknowledgements The present research was funded by the Research Council of Norway under grant 251219 to GB. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.jneuroling.2019.100877. References Asher, N. (1986). Belief in discourse representation theory. Journal of Philosophical Logic, 15(2), 127–189. Asher, N. (1987). A typology for attitude verbs and their anaphoric properties. Linguistics and Philosophy, 10(2), 125–197. Baggio, G. (2012). Selective alignment of brain responses by task demands during semantic processing. Neuropsychologia, 50(5), 655–665. Baggio, G. (2018). Meaning in the Brain. Cambridge, MA: MIT Press. Baggio, G., & Hagoort, P. (2011). The balance between memory and unification in semantics: A dynamic account of the N400. Language & Cognitive Processes, 26(9), 1338–1367. Baggio, G., Van Lambalgen, M., & Hagoort, P. (2008). Computing and recomputing discourse models: An ERP study. Journal of Memory and Language, 59(1), 36–53. Bambini, V., Canal, P., Resta, D., & Grimaldi, M. (2019). Time course and neurophysiological underpinnings of metaphor in literary context. Discourse Processes, 56(1), 77–97. Barwise, J., & Perry, J. (1981). Situations and attitudes. The Journal of Philosophy, 78(11), 668–691. Carnap, R. (1947). Meaning and necessity: A study in semantics and modal logic (2nd ed.). Chicago: The University of Chicago Press [1958]. Declerck, R. (2011). The definition of modality. In A. Patard, & F. Brisard (Eds.). Cognitive approaches to tense, aspect and epistemic modality (pp. 21–44). Amsterdam: John Benjamins. Delogu, F., Vespignani, F., & Sanford, A. J. (2010). Effects of intensionality on sentence and discourse processing: Evidence from eye-movements. Journal of Memory and Language, 62(4), 352–379. DeLong, K. A., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150–162. den Dikken, M., Larson, R., & Ludlow, P. (2016). Intensional transitive verbs and abstract clausal complementation. In A. Grzankowski, & M. Montague (Eds.). Nonpropositional intentionality. Oxford: Oxford University Press. Domaneschi, F., Canal, P., Masia, V., Vallauri, E. L., & Bambini, V. (2018). N400 and P600 modulation in presupposition accommodation: The effect of different trigger types. Journal of Neurolinguistics, 45, 13–35. Federmeier, K. D., Wlotko, E., De Ochoa‐Dewald, E., & Kutas, M. (2007). Multiple effects of sentential constraint on word processing. Brain Research, 1146, 75–84. Fischler, I., Bloom, P. A., Childers, D. G., Roucos, S. E., & Perry, N. W., Jr. (1983). Brain potentials related to stages of sentence verification. Psychophysiology, 20(4), 400–409. Fodor, J. A. (1978). Propositional attitudes. The Monist, 501–523. Gauker, C. (2003). Words without meaning. Cambridge (MA): MIT Press. Gibson, E., & Pearlmutter, N. J. (1998). Constraints on sentence comprehension. Trends in Cognitive Sciences, 2(7), 262–268. Guevara, E. (2010, June). NoWaC: A large web-based corpus for Norwegian. Proceedings of the NAACL HLT 2010 sixth Web as corpus workshop (pp. 1–7). Association for Computational Linguistics. Hackl, M., Koster-Moeller, J., & Gottstein, A. (2009). Processing opacity. Proceedings of Sinn und Bedeutung, 13, 171. Hald, L. A., Steenbeek-Planting, E. G., & Hagoort, P. (2007). The interaction of discourse context and world knowledge in online sentence comprehension. Evidence from the N400. Brain Research, 1146, 210–218. Harman, G. (1972). Logical form. Foundations of Language, 9, 38–65. Kratzer, A., & Heim, I. (1998). Semantics in generative grammar. Oxford: Blackwell. Hintikka, J. (1969). Semantics for propositional attitudes. Models for modalities (pp. 87–111). Dordrecht: Springer. Jackendoff, R. (1983). Semantics and cognition. Cambridge, MA: MIT Press. Jackendoff, R. (1985). Believing and intending: Two sides of the same coin. Linguistic Inquiry, 16(3), 445–460. Jurafsky, D. (2003). Probabilistic modeling in psycholinguistics: Linguistic comprehension and production. Probabilistic Linguistics, 21. Kulakova, E., & Nieuwland, M. S. (2016). Understanding counterfactuality: A review of experimental evidence for the dual meaning of counterfactuals. Language and Linguistics Compass, 10(2), 49–65. Kuperberg, G., & Wlotko, E. (2018). A Tale of Two Positivities (and the N400): Distinct neural signatures are evoked by confirmed and violated predictions at different levels of representation. bioRxiv, 404780. Kutas, M., & Federmeier, K. D. (2000). Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Sciences, 4(12), 463–470. Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621–647. Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207(4427), 203–205. Lewis, D. (1986). On the plurality of worlds. Oxford: Basil Blackwell. Maris, E. (2004). Randomization tests for ERP topographies and whole spatiotemporal data matrices. Psychophysiology, 41, 142–151. Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG and MEG data. Journal of Neuroscience Methods, 164, 177–190. Masia, V., Canal, P., Ricci, I., Vallauri, E. L., & Bambini, V. (2017). Presupposition of new information as a pragmatic garden path: Evidence from event-related brain potentials. Journal of Neurolinguistics, 42, 31–48. Nieuwland, M. S., Barr, D. J., ... Von Grebmer Zu Wolfsthurn, S. (2019). Dissociable effects of prediction and integration during language comprehension: Evidence from a large-scale study using brain potentials. Philosophical Transactions of the Royal Society B forthcoming.

13

Journal of Neurolinguistics 53 (2020) 100877

L. Călinescu, et al.

Nieuwland, M. S., & Kuperberg, G. R. (2008). When the truth is not too hard to handle: An event-related potential study on the pragmatics of negation. Psychological Science, 19(12), 1213–1218. Nieuwland, M. S., Politzer-Ahles, S., Heyselaar, E., Segaert, K., Darley, E., Kazanina, N., et al. (2018). Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. ELife, 7, e33468. Nieuwland, M. S., & Van Berkum, J. J. (2006). When peanuts fall in love: N400 evidence for the power of discourse. Journal of Cognitive Neuroscience, 18(7), 1098–1111. Oostenveld, R., Fries, P., Maris, E., & Schoffelen, J. M. (2011). FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational Intelligence and Neuroscience, 2011, 1. Otten, M., & Van Berkum, J. J. (2007). What makes a discourse constraining? Comparing the effects of discourse message and scenario fit on the discourse-dependent N400 effect. Brain Research, 1153, 166–177. Paczynski, M., Jackendoff, R., & Kuperberg, G. (2014). When events change their nature: The neurocognitive mechanisms underlying aspectual coercion. Journal of Cognitive Neuroscience, 26(9), 1905–1917. Padó, U., Crocker, M. W., & Keller, F. (2009). A probabilistic model of semantic plausibility in sentence processing. Cognitive Science, 33(5), 794–838. Partee, B. (1979). Semantics—mathematics or psychology? Semantics from different points of view (pp. 1–14). Berlin, Heidelberg: Springer. Portner, P. (1997). The semantics of mood, complementation, and conversational force. Natural Language Semantics, 5(2), 167–212. Pylkkänen, L., Llinás, R., & McElree, B. (2004). Distinct effects of semantic plausibility and semantic composition in MEG. Biomag 2004: Proceedings of the 14th international conference on biomagnetism. Boston, USA: Biomag. Shetreet, E., Alexander, E. J., Romoli, J., Chierchia, G., & Kuperberg, G. (2019). What we know about knowing: Presuppositions generated by factive verbs influence downstream neural processing. Cognition, 184, 96–106. Stalnaker, R. (1984). Inquiry. Cambridge: MIT Press. Stowe, L. A., Kaan, E., Sabourin, L., & Taylor, R. C. (2018). The sentence wrap-up dogma. Cognition, 176, 232–247. Urbach, T. P., DeLong, K. A., & Kutas, M. (2015). Quantifiers are incrementally interpreted in context, more than less. Journal of Memory and Language, 83, 79–96. Urbach, T. P., & Kutas, M. (2010). Quantifiers more or less quantify on-line: ERP evidence for partial incremental interpretation. Journal of Memory and Language, 63(2), 158–179. Van Berkum, J. J. V., Hagoort, P., & Brown, C. M. (1999). Semantic integration in sentences and discourse: Evidence from the N400. Journal of Cognitive Neuroscience, 11(6), 657–671. Van Berkum, J. J., Zwitserlood, P., Hagoort, P., & Brown, C. M. (2003). When and how do listeners relate a sentence to the wider discourse? Evidence from the N400 effect. Cognitive Brain Research, 17(3), 701–718. Van Petten, C., & Luka, B. J. (2012). Prediction during language comprehension: Benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176–190. White, A. S., Hacquard, V., & Lidz, J. (2018). Semantic information and the syntax of propositional attitude verbs. Cognitive Science, 42(2), 416–456. Wittenberg, E., Paczynski, M., Wiese, H., Jackendoff, R., & Kuperberg, G. (2014). The difference between “giving a rose” and “giving a kiss”: Sustained neural activity to the light verb construction. Journal of Memory and Language, 73, 31–42.

14