Scene incongruity and attention

Scene incongruity and attention

Consciousness and Cognition 48 (2017) 87–103 Contents lists available at ScienceDirect Consciousness and Cognition journal homepage: www.elsevier.co...

1MB Sizes 2 Downloads 96 Views

Consciousness and Cognition 48 (2017) 87–103

Contents lists available at ScienceDirect

Consciousness and Cognition journal homepage: www.elsevier.com/locate/concog

Scene incongruity and attention Arien Mack ⇑, Jason Clarke, Muge Erol, John Bert The New School for Social Research, 80 Fifth Avenue, 7th Floor, New York, NY 10011, United States

a r t i c l e

i n f o

Article history: Received 18 July 2016 Accepted 21 October 2016

Keywords: Attention Scene incongruity Inattention Change detection Iconic memory

a b s t r a c t Does scene incongruity, (a mismatch between scene gist and a semantically incongruent object), capture attention and lead to conscious perception? We explored this question using 4 different procedures: Inattention (Experiment 1), Scene description (Experiment 2), Change detection (Experiment 3), and Iconic Memory (Experiment 4). We found no differences between scene incongruity and scene congruity in Experiments 1, 2, and 4, although in Experiment 3 change detection was faster for scenes containing an incongruent object. We offer an explanation for why the change detection results differ from the results of the other three experiments. In all four experiments, participants invariably failed to report the incongruity and routinely mis-described it by normalizing the incongruent object. None of the results supports the claim that semantic incongruity within a scene invariably captures attention and provide strong evidence of the dominant role of scene gist in determining what is perceived. Ó 2016 Elsevier Inc. All rights reserved.

1. Introduction It has long been recognized that incongruity within a scene has a unique role in perception (Bruner & Postman, 1949). In this early and classic paper, the authors examine how the violation of perceptual expectancies affects perception. ‘‘The principal concern of this paper”, they write, ‘‘is with the perceptual events which occur when perceptual expectancies fail of confirmation – the problem of incongruity. Incongruity represents a crucial problem for a theory of perception because, by its very nature, its perception represents a violation of expectation. An unexpected concatenation of events, a conspicuous mismatching, an unlikely pairing of cause and effect – all of these have in common a violation of normal expectancy.” (p. 208). Since then congruity and incongruity effects on perception have been studied by a host of other researchers (e.g. Biederman, 1972; Biederman, Mezzanotte, & Rabinowitz, 1982; Davenport & Potter, 2004; Greene, Botros, Beck, & Fei-Fei, 2015; Henderson & Hollingworth, 1998; Hollingworth & Henderson, 2000; LaPointe, Lupianez, & Milliken, 2013; Palmer, 1975). Some general findings of this research are that objects consistent with their contexts (i.e. with scene gist) are identified more correctly and faster than objects that are inconsistent (Biederman et al., 1982; Davenport & Potter, 2004; Palmer, 1975; Rémy, Vayssière, Pins, Boucart, & Fabre-Thorpe, 2014). In contrast, changes to inconsistent objects are detected faster in change detection tasks (Hollingworth & Henderson, 2000), although LaPointe et al. (2013) found this latter to be true only when change detection requires localization of the change. When change detection requires identification of the change, the outcome reverses and identification of consistent objects occurs faster (LaPointe et al., 2013). Interest in the question of incongruity has re-emerged more recently in the service of trying to understand the degree to which information integration occurs outside of consciousness. On one view, information integration requires conscious ⇑ Corresponding author. E-mail address: [email protected] (A. Mack). http://dx.doi.org/10.1016/j.concog.2016.10.010 1053-8100/Ó 2016 Elsevier Inc. All rights reserved.

88

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

processing, (e.g. Baars, 2002; Treisman, 2003), so that if incongruity were picked up outside of consciousness it would argue against this view and would be strong evidence of some kind of unconscious input integration. But whether or not incongruity is picked up without awareness and defies unconscious integration because it is inconsistent with expectancies, (and, therefore, leads to the capture of attention and conscious processing), there is now considerable evidence that some information integration does occur outside of awareness (see for example Dehaene on ensemble processing outside of awareness (2014). The bulk of the recent research on the question of whether a semantically incongruent scene (e.g. a snowman on a beach in contrast to a sand castle) is picked up outside of awareness and captures attention has been done by Mudrik and colleagues who have given the question a positive answer (Mudrik, Deouell, & Lamy, 2011; Mudrik, Faivre, & Koch, 2014; Mudrik & Koch, 2013), while a paper by Moors, Boelens, van Overwalle, and Wagemans (2016) suggests a negative one. Mudrik and colleagues have looked at incongruity and perceptual processing in several ways. In one set of experiments (Mudrik & Koch, 2013), they explored subliminal priming by incongruent scenes and showed that when a scene depicting an incongruent action, e.g. a woman baking a chessboard rather than a pan of cookies, is presented subliminally, it slows subsequent responses to a liminal scene depicting either an incongruent or congruent action. This is taken by the authors as evidence that incongruity is picked up outside of awareness, and, because it is a violation of prior associations or expectations, requires additional processing, which is why it slows subsequent responding. ‘‘Arguably, when integration involves previously learned associations, acquired during past conscious experiences, it can be unconsciously performed... Yet when the scene involves objects that were not previously integrated during conscious perception, (i.e. incongruent scenes), integration fails. This failure may lead to the allocation of additional attentional resources and may thereby hinder subsequent performance.” (Mudrik & Koch, 2013, p. 9). On this view incongruity mandates attentional engagement and conscious processing. In another set of papers Mudrik and colleagues (Mudrik, Deouell et al., 2011) looked at the influence of scenes depicting an incongruent action on binocular rivalry using Continuous Flash Suppression (CFS). They report that incongruent scenes escaped perceptual suppression faster than comparable congruent scenes, which they think may be because the incongruity, unlike congruity, cannot be resolved without consciousness. In still another paper also entailing binocular rivalry, Mudrik and colleagues report that when one eye views a congruent image and the other an incongruent version of the same image, e.g., a man drinking from a glass versus a man ‘drinking’ from a hairbrush, the incongruent image persists in consciousness longer, that is, it dominates the congruent one. This is again taken as evidence that the incongruity requires more attention and thus takes longer to be resolved (Mudrik, Deouell et al., 2011). In another experiment Mudrik and colleagues (Mudrik, Lamy, & Deouell, 2010) found event related potential (ERP) differences in the processing of similar scenes depicting either a congruent or incongruent action such that contextual congruity affects scene processing earlier. The findings that scenes with an incongruity break though suppression associated with binocular rivalry and dominate for longer than comparable congruent scenes is not, however, uncontested. In fact, the exact opposite has been reported. Like Mudrik and colleagues, Pinto, van Gaal, de Lange, Lamme, and Seth (2015) used CFS but found in contrast, that images consistent with expectation (the expectations were created at the outset of the experiment by the experimenters) entered consciousness faster than neutral or unexpected images. An even more recently reported set of experiments also provided results (Moors et al., 2016) that are at odds with those reported by Mudrik and colleagues. Their studies, which were designed as an attempt to replicate the Mudrik et al. findings, were closely modeled on the Mudrik experiments, and in fact used scenes from the Mudrik archive and the same CFS procedure. These researchers failed to find any evidence of a congruity effect. They found no evidence that scenes containing objects that were incongruent with scene gist broke through suppression any faster than the comparable congruent scenes. The first of their 3 experiments quite faithfully repeated Mudrik, Deouell et al.’s (2011) procedures but included an additional condition in which the scenes were inverted. They reasoned that if there is a congruity effect, it should not be apparent in the scene inversion condition. Unlike Mudrik et al., they found that congruent scenes actually broke suppression faster than incongruent ones, although this difference did not reach significance. Also, and not unexpectedly, they found a scene inversion effect, such that upside-down scenes took longer to break through suppression than right side up ones. The remaining 2 experiments confirmed their main finding of no congruity effect. These two sets of experiments by two different groups of researchers raise serious questions about whether incongruity between an object and the gist of a scene or minimally between two objects in a scene is extracted outside of awareness and leads to the capture of attention and consequent conscious processing. This is the question addressed by the 4 experiments reported in this paper. Our interest in the question stems from our long term interest in the role of attention and inattention in perception (Mack & Rock, 1998) and so is not primarily concerned with the question of whether semantic information integration occurs outside awareness. Rather the question at issue is whether semantic incongruity within a scene is likely to capture attention and consequently be consciously perceived. This is explicitly argued by Mudrik and colleagues (Mudrik et al., 2014) where they write that, ‘‘The difficulty of identifying the object or integrating it with the scene then affects participants’ performance: it raises the attentional saliency of the scene causing it to emerge into awareness sooner. . .. .” (p. 7). The experiments described in this paper used different phenomena to address the question of whether scene incongruity captures attention. In one set of experiments using an inattentional blindness (IB) procedure, we looked at whether scene incongruity, that is a scene containing an object that is inconsistent with the gist of the scene (e.g., a woman putting a chessboard in an oven), is more likely to be perceived under conditions of inattention, (and therefore reduce or defeat inatten-

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

89

tional blindness), compared to a scene with a gist consistent object (e.g., a woman putting a tray of cookies in an oven). If incongruity is detected outside of awareness and captures attention, IB should be less for scenes containing an incongruent object. Scene gist was operationalized as a written or verbal description that came close to capturing the main sense of the scene, e.g. ‘‘A woman cooking” or ‘‘A picture of a man playing a guitar”. Because the scenes all depicted an action, gist was captured by describing the action. In another set of experiments, we looked at whether object incongruity within a scene reduces change blindness (CB) such that when an incongruent object appears and reappears within a scene, (for example, a snowman in a beach scene), the change is more likely to be picked up faster than when a comparable congruent object appears and disappears within the same scene (for example, a sandcastle in a beach scene). If incongruity captures attention, it should. As mentioned above, this question, unlike the question of incongruity and inattentional blindness, has been the subject of some prior research suggesting that when the change has only to be localized, changes to scene inconsistent objects are detected faster, but when the change must be identified, changes to consistent objects are detected more quickly (LaPointe et al., 2013). Our results will be discussed in light of these earlier findings. Finally, we looked at whether a scene containing an incongruent object in an array of 3 other scenes with no incongruent objects interferes with the gist perception of one of the normal scenes when they are presented briefly (500 ms). This should be the case if the incongruency captures attention and thus reduces attention to the normal scenes, which is necessary to transfer the needed information into working memory (e.g. Sperling, 1960). In these experiments we used a procedure that shares many of the features of the procedure used to investigate iconic memory, in particular we used the post-cue, a variant of the partial report procedure first described by Sperling (1960; see also Clarke & Mack, 2014). In all but the change detection experiment we used congruent and incongruent scenes from the set of Mudrik scenes, which she graciously provided. We created our own scenes for the change detection experiment because in the Mudrik scenes, all of which depict a single action, the targets (incongruent as well as congruent) are at the center of the scene and central to scene gist, and others have shown that changes to objects of central interest are detected more quickly than to objects of marginal interest (Rensink, O’Regan, & Clark, 1997). This being the case change detection would have been trivially easy. In the scenes we created, the target objects were never centrally located and our scenes did not depict a single action, like baking or throwing a basketball, but rather were of inside and outside environments, such as a beach, a forest, a dentist office or a dining room in which a single object was either consistent or inconsistent with the scene, e.g., a sandcastle on a beach or a sandcastle in a forest. Examples of stimuli used in the following four experiments can be found at https://mackperceptionlab.wordpress.com/

2. Experiment 1: Scene Incongruity and Inattention 2.1. Experiment 1A 2.1.1. Methods and materials 2.1.1.1. Participants. 20 participants were recruited from The New School and Craigslist.org (10 per condition). All participants signed consent forms, had normal or corrected-to-normal vision and were compensated for their participation. 2.1.1.2. Equipment. The experiment was run on a 1.83 GHz Intel Core Duo Mac Mini computer and was presented on a 27 in. ASUS VG278HE monitor with a refresh rate of 60 Hz and a screen resolution of 1920  1080. The experiment was programed using SuperLab 5. Participants were seated in a darkened room positioned approximately 76 cm from the computer screen with their heads stabilized with a chin rest. 2.1.1.3. Stimuli 2.1.1.3.1 Images and patterns. Three pairs of color photographs of scenes were used for critical trials depicting a person performing an action with an object (taken from a database used by Mudrik & Koch, 2013; Mudrik, Deouell et al., 2011; Mudrik et al., 2014). Each pair consisted of a congruent scene and an incongruent scene and was only disparate by virtue of a single object in the scene. Objects in the congruent scenes were correctly related to the action being performed, while objects in the incongruent scenes were incorrectly related to the action. The three pairs of scenes were of: (a) a girl licking an ice cream cone or a girl licking a light-bulb; (b) a woman putting a cookie tray in an oven or a woman putting a chess board in an oven; and (c) a man playing a guitar or a man ‘playing’ a household broom. In all scenes, the person performing the action was at the center of the scene. Images were randomly assigned, without replacement, to serve as the scenes on the critical trials in the inattention, divided attention, and full attention conditions. Half of the participants were shown incongruent scenes on all critical trials, while the remaining participants were shown the congruent version. Six mosaic patterns were present on the non-critical trials consisting of pixelated versions of other images from the Mudrik scene archive (these patterns appeared as a mosaic of randomly located solid-colored squares and were not recognizable as scenes). All scenes and mosaic patterns were centered at fixation and subtended 3.8  4.7 deg of visual angle. 2.1.1.3.2. Crosses. A random assortment of crosses was used on every trial. The arms of each cross ranged in length from 4.1 to 3.0 deg. Differences in the length of the arms ranged from 1.2 to 0.6 deg. The center of each cross was 5.9 deg from fixation and was presented in one of four corners of a nominal square centered around fixation.

90

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

2.1.1.3.3. Masks. Masks were high resolution pattern masks consisting of a mosaic of colored squares subtending 15  15 deg of visual angle. 2.1.1.4. Procedure. There were 2 conditions: a between-participant condition (congruency: congruent or incongruent scenes) and a within-participant condition with 3 levels (attention: inattention, divided, and full attention). These were always administered in the same order: inattention, divided and full attention. There was one critical trial per condition, which was always the last trial in that condition. On all critical trials, the cross appeared with a scene centered at fixation and was followed immediately by a pattern mask. On all non-critical trials, the cross appeared with the mosaic centered at fixation and also was always followed by a pattern mask. 2.1.1.4.1. Inattention condition. Participants were instructed to simply report the longer arm of the cross. There were 4 trials, the first 3 of which were non-critical. On the critical trial, immediately following the participants’ report of the longer cross arm, participants were asked whether they had seen anything different on the screen on this last trial, and if so, what. These responses provided evidence of whether the participant had seen the scene and picked up its gist when attention was absorbed by the demanding cross task. If so, another question was whether they were aware of the incongruity. 2.1.1.4.2. Divided attention condition. Participants were instructed to report the longer arm of the cross and anything else they saw on the screen. There were four trials, the last of which was the critical trial. If participants were aware of the scene on the critical trial, they reported what they had seen following the critical trial. 2.1.1.4.3. Full attention condition. Participants were instructed to ignore the crosses and only report whatever else they saw on the screen. There were three trials, the last of which was the critical trial and contained a scene. Participants’ reports of the scene on this trial were evidence of what they saw with full attention. At the very end of the experiment, participants were asked if they had seen anything ‘‘weird” or ‘‘odd” in the images presented on the critical trials. The experimental trials were preceded by 10 practice trials, none of which contained scene stimuli. Each noncritical trial consisted of a fixation point (1500 ms), followed by an array consisting of a cross (at one of the corners of the nominal square) (100 ms) and the mosaic pattern centered at fixation. This was followed by the mask (500 ms), which was followed by a blank screen. On the inattention and divided attention critical trials, the participant’s task was to report which arm of the cross was longer by pressing ‘V’ for ‘vertical’ and ‘H’ for ‘horizontal’ during the blank screen. On the critical trial, the mosaic pattern was replaced by a scene. On this trial, after participants had reported the longer arm of the cross, they were queried about whether they had seen anything new. In the full attention trials, participants only reported what they saw other than crosses on every trial. We predicted that if scene incongruity leads to the capture of attention, then there would be significantly reduced inattentional blindness when scenes containing some context and object violation are the target scenes compared to scenes when there is no such violation. 2.1.2. Results 2.1.2.1. Inattention condition 2.1.2.1.1. Congruent scenes (n = 10). Six participants did not report seeing anything other than the crosses (IB = 60% on the critical trial). Four out of 10 participants reported seeing a scene and described the gist of the scene. 2.1.2.1.2. Incongruent scenes (n = 10). Six participants were IB to the scenes and reported seeing nothing different on the critical trial (IB = 60%). Of the 4 participants who reported seeing something different, all 4 described the gist of the scene with varying degrees of accuracy, but none reported any incongruity. Two out of the 4 participants who reported seeing something different on the critical trial normalized the scene. For example, one participant shown the scene with a girl licking the light-bulb described it as ‘‘a girl eating an ice cream cone”. The other participant who was presented with the picture of a man ‘‘playing” a broom described it as ‘‘a person playing a guitar”. The other two participants correctly identified the scene as containing a person but did not mention the action nor the incongruity. For example, one person reported ‘‘an individual person was presented in the display”. 2.1.2.2. Divided attention condition 2.1.2.2.1. Congruent scenes. 8 out of 10 participants were aware of a scene and described the gist of the scene. The remaining 2 participants were blind to the scene. 2.1.2.2.2. Incongruent scenes. 7 out of 10 participants were aware of the scene and described its gist. Again none of the participants who reported seeing the scene mentioned the incongruity, and as in the inattention condition, tended to normalize the scene, e.g. they described the man ‘‘playing” the broom as a man playing a guitar. Five out of the 7 participants who were aware of the scene normalized it. The remaining two participants reported ‘‘a picture of an young man” and ‘‘an old female person” for the man ‘‘playing” the broom scene and the old lady ‘‘baking” the chess set scene, respectively. 2.1.2.3. Full attention condition 2.1.2.3.1. Congruent scenes. 9 of the 10 participants were aware of the scene and described its gist. One participant was unaware of the scene on the critical trial.

91

Percent participants reporting scene gist

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

100

90

90

90

80

80

70

70 60 50

40

40

40 30 20 10 0 Inattention

Divided Attention

Full Attention

Attention Condition Scene Congruent

Scene Incongruent

Fig. 1. Experiment 1A: Percentage of participants who reported scene gist on the critical trial in each attention condition. The inverse of the number is the percentage of participants who did not report seeing the scene, which in the inattention condition is the percentage of inattentional blindness (IB). None of the participants reported the scene incongruity or being aware of any scene incongruity.

2.1.2.3.2. Incongruent scenes. 9 out of 10 participants were aware of the scene and described its gist but none reported the incongruity (see Fig. 1). Five out of the nine participants normalized the scene (e.g. they reported seeing a picture of a man playing a guitar or a girl with an ice-cream cone). The other four reported seeing ‘‘a woman”, ‘‘a picture of a young man”, ‘‘a man working in a lab”, and ‘‘a person”. The main question of interest was whether scenes containing an incongruent action and object captured attention leading to significantly less inattentional blindness than scenes that contained a congruent action and object. A Mann-Whitney U test revealed that frequency of reports of scene gist in the congruent scene condition did not differ significantly from those in the incongruent scene condition in either the inattention condition, U = 40.00, z = 0.872, p = 0.383, in the divided attention condition, U = 45.00, z = 0.503, p = 0.615, or in the full attention condition, U = 50.00, z = 0.000, p = 1.000. In both cases (the congruent scene and incongruent scene conditions), the level of inattentional blindness (IB) is 60%. In order to explore the role of attention in reports of scene gist, we ran two separate Cochran’s Q non-parametric analyses of variance for the congruent scene condition and for the incongruent scene condition. A Cochran’s Q test for the congruent scene condition (inattention, divided attention, and full attention) found a marginally significant difference among the three attention conditions, X2(2) = 6.00, p = 0.05. Subsequent pairwise comparisons indicated a significant difference between the frequencies of reports of gist in the inattention compared to the full attention conditions, X2(1) = 5.00, p = 0.025, and between the inattention and divided attention conditions, X2(1) = 4.00, p = 0.046, but not between the divided and full attention conditions, X2(1) = 1.00, p = 0.317. A Cochran’s Q test for the incongruent scene condition (inattention, divided, and full attention) revealed a significant difference among the three attention conditions, X2(2) = 7.60, p = 0.02. There was a significant difference in frequency of gist reports between the inattention and full attention conditions, X2(1) = 5.00, p = 0.025, but not between the inattention and divided attention conditions, X2(1) = 3.00, p = 0.083, nor between the divided and full attention conditions, X2(1) = 2.00, p = 0.157. 2.1.3. Discussion The significant difference in the frequency of gist reports in the inattention condition and the full attention condition corroborates our earlier finding that attention is necessary for conscious perception of the gist of scenes (Mack & Clarke, 2012). Without attention, participants are functionally blind to the presence of a scene and its content. More importantly, not only did we fail to find a difference in the frequency of IB to incongruent and congruent scenes (60% IB in both conditions), we also found that none of the participants ever picked up on the incongruent action and object relationship in the scene (e.g. a lady ‘baking’ a chess set) in any of the three experimental conditions, not even in the full attention condition in which the only task was to report what was on the screen in addition to the cross. 2.2. Experiment 1B In an attempt to understand why participants do not report scene incongruity even with full attention and with 100 ms scene presentation, we reasoned that the size of the scene might be a contributing factor. Perhaps participants can pick up the action with scenes of this size and visual angle, but not the object being used to perform the action. With this in mind, we increased the size of the scenes by 50%. Scenes in Experiment 1A subtended 3.8  4.7 deg of visual angle. In Experiment 1B,

92

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

scenes subtended 5.7  7.05 deg. Everything else remained the same. Only the incongruent scene condition was run since participants had no trouble reporting the scene gist of congruent scenes, at least with full attention, and we were primarily concerned with whether with larger scenes participants would be able to pick up the incongruity between the action and the object used for the action. 2.2.1. Methods and materials 2.2.1.1. Participants. 10 participants were recruited from The New School and Craigslist.org. All participants reported normal or corrected-to-normal vision. 2.2.1.2. Procedure. The procedure was the same as in Experiment 1A. 2.2.2. Results 2.2.2.1. Inattention condition. Four participants were unaware that a scene had been presented on the critical trial (IB = 40%). Six out of 10 participants (60%) were aware of the scene and described it with the correct gist. None of the participants reported being aware of the incongruity. Five of the six participants who were aware of the scene normalized the scene, while the remaining participant described the girl licking a light-bulb scene as ‘‘a girl”. 2.2.2.2. Divided attention condition. Nine of the 10 participants (90%) were aware that a scene had been presented and described the correct gist. One participant was unaware of the scene. Again, none of the participants mentioned the incongruity. Eight of the 9 participants who were aware of the scene normalized the scene (e.g. a guy playing a guitar). The other participant reported seeing a scene but did not give any further information. 2.2.2.3. Full attention condition. Eight out of 10 participants (80%) were aware of the scene and described the gist. 2 participants were unaware of the scene. None of the participants mentioned the incongruity (see Fig. 2). Six participants who were aware of the scene normalized it (e.g. a lady with an instrument in her hand or a woman making cookies, for the man ‘‘playing” the broom and the woman putting the chess board in the oven scenes). The other 2 participants reported seeing ‘‘a woman bending toward a table” and ‘‘woman reaching for something” when presented with the woman putting the chessboard in the oven scene. A Cochran’s Q analysis revealed no significant difference in frequency of reports of gist among all three attention conditions (inattention, divided, and full attention), X2(2) = 4.66, p = 0.09. And, once again, none of the participants reported scene incongruity. 2.2.3. Discussion In Experiment 1B, increasing the size of the scenes by a half led to less inattentional blindness (40% compared to 60% in Experiment 1A). However, even when the size of the scenes was increased by a half, none of the participants reported the incongruent object/context relationship for any of the scenes in any of the three attention conditions, even in the full attention condition, which suggests that the failure to find a difference between the incongruent and the congruent conditions in the first experiment was not due to the size of the scenes. 2.3. Experiment 1C

Percent parcipants reporng gist

Another factor that might have played a role in participants’ failure to report scene incongruity is presentation time. Perhaps 100 ms is not long enough for participants to pick up on the incongruity in the scene. Indeed, Greene et al. (2015) 100

90

90

80

80 70

60

60 50 40 30 20 10 0 n

Divided

n

n

Aenon Condion Fig. 2. Experiment 1B: Size of the incongruent condition scenes increased by half. Percentage of participants who reported scene gist on the critical trial in each attention condition. None reported scene incongruity or being aware of any scene incongruity.

93

Percent parcipants reporng gist

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

100

90

90 80

70

70 60

50

50 40 30 20 10 0 n

Divided

n

n

Aenon Condion Fig. 3. Experiment 1C. Scene presentation time of the incongruent condition scenes increased to 200 ms (from 100 ms). Percentage of participants who reported scene gist on the critical trial in each attention condition. None reported scene incongruity or being aware of any scene incongruity.

reported that participants need approximately 150 ms to reliably identify a scene as being ‘‘improbable”. With that in mind, we increased the presentation time of the scenes to 200 ms. As in Experiment 1B, only incongruent scenes served as the critical stimuli. Scenes subtended the same visual angle as they did in Experiment 1A (3.8  4.7 deg). 2.3.1. Methods and materials 2.3.1.1. Participants. 10 new participants were recruited from The New School and Craiglist.org. All gave informed consent and reported normal or corrected-to-normal vision. 2.3.1.2. Procedure. The procedure was identical to Experiment 1B. 2.3.2. Results 2.3.2.1. Inattention condition. Five out of the 10 participants were unaware that a scene had been presented on the critical trial (IB = 50%). The remaining 5 participants were aware of the scene and described it with the correct gist. Again none of the participants who were aware of the scene reported that they had been aware of the scene incongruity. As in the previous experiments, participants tended to normalize the scenes (e.g. they reported the scene of the lady baking the chess set as a lady baking cookies). Four out of the five participants who reported seeing a scene normalized it (e.g. they reported ‘‘a girl’s face eating ice-cream” for the girl licking a light-bulb scene). The other participant reported seeing ‘‘a girl”, and gave no further information. 2.3.2.2. Divided attention condition. Seven out of 10 participants were aware of the scene and described the gist correctly but none of them reported the incongruity. 3 out of 10 participants were unaware of the presence of a scene. Six out of the seven participants who were aware of the scene normalized it. The other participant reported seeing ‘‘a picture of a man” for the man ‘‘playing” the broom scene. 2.3.2.3. Full attention condition. Nine out of 10 participants were aware of the scene and described the gist correctly. 1 out of 10 participants did not report seeing a scene. None of the participants reported the incongruity (see Fig. 3). Six of the nine participants who reported seeing a scene normalized it. The remaining three participants reported seeing ‘‘a man”, ‘‘a lady”, and ‘‘an old woman and child”. A Cochran’s Q analysis revealed that there was a significant difference in the frequency of gist reports across all three attention conditions (inattention, divided, and full attention), X2(2) = 6.0, p = 0.05. None of the participants reported scene incongruity. 2.3.3. Discussion Even with a 200 ms scene presentation time, participants failed to report the improbable aspects of the scene in any of the attention conditions. Again, this is particularly striking in the full attention condition, as their only task was to report anything that appeared along with the cross. This is inconsistent with the results of Mudrik, Deouell et al. (2011) and Mudrik et al. (2014) who found that scenes depicting an incongruent action emerged into consciousness and dominated the comparable congruent scene when the two are dichoptically presented producing binocular rivalry. 3. Experiment 2: Scene incongruity and scene descriptions and classifications Experiment 1 failed to find any evidence that scenes with incongruent action and object relationships lead to less inattentional blindness than comparable scenes. Furthermore, and especially surprising, no participant ever reported seeing the

94

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

incongruity in any of the three attention conditions (inattention, divided, or full attention) even when the scenes containing incongruity were increased in size by a half or presented for longer (100 ms compared to 200 ms). In an attempt to explore this further, we ran a second experiment which required full attention being deployed to the scenes. We asked participants to provide a brief description of each briefly presented scene, and specifically to describe the action being performed. As the incongruity in the scene lay in the relationship between the action and the object used to perform the action, we reasoned that focusing participants’ attention on the action would afford perception of the object being used to perform the action and thus on the incongruity when it was present. 3.1. Methods and materials 3.1.1. Participants 20 new participants were recruited from The New School and Craiglist.org. All participants had normal or correct to normal vision and were compensated or given research credit for their participation. 3.1.2. Equipment The experiment was run on a 1.83 GHz Intel Core Duo Mac Mini computer and was presented on a 27 in. ASUS VG278HE monitor with a refresh rate of 60 Hz and a screen resolution of 1920  1080. The experiment was programed using Superlab 5. Participants were seated in a darkened room positioned approximately 76 cm from the computer screen with heads stabilized with a chin rest. 3.1.3. Stimuli We used the same scenes as in Experiment 1 along with 7 other pairs of congruent and incongruent scenes taken from Mudrik’s scene collection. Scenes subtended the same visual angle as in Experiment 1A (3.8  4.7 deg) and were followed by the same pattern masks used in Experiment 1. 3.1.4. Procedure The experiment had two parts. In the first part, ‘scene descriptions’, participants were told that their only task was to describe the action being performed and anything else they saw. After a 1000 ms fixation cross, the scene appeared at the center of the screen for either 100 ms for one group of participants (n = 10) or 200 ms for the other group (n = 10). Scenes were followed by the same masks that were used in Experiment 1. Following the mask, a textbox appeared on the screen, into which participants were instructed to write a brief description of what they had seen and, especially, to report the action being performed. After pressing a key labeled ‘‘Next”, they continued to the next trial. There were 12 trials in this part of the experiment: 6 containing incongruent scenes and 6 containing congruent scenes. In the second part of the experiment, we showed the same scenes again (100 ms and masked to one group and 200 ms and masked to the other) and the same group of participants was asked to simply report whether the scene was ‘‘weird” (by pressing the ‘w’ key) or ‘‘not weird” (by pressing the ‘n’ key). In the instructions, participants were given the following examples of a ‘‘weird” scene: ‘A woman sitting at the bottom of the ocean watching television’ and ‘a man flying in the sky on a sofa’. Twelve scenes consisting of 6 scene pairs (an incongruent or ‘‘weird” version of the scene and a congruent or ‘‘not weird” version of the scene) were shown to each participant. No participant ever saw more than one version of the scene (the ‘weird’ or the ‘not weird’ version). However, because there were two parts to this experiment (‘scene description’ and ‘weird or not weird’) each participant saw each scene twice. All scenes were counterbalanced, so that all scenes were shown an equal number of times. 3.2. Results We calculated percentage correct for each participant for the action being performed and then computed a group mean. As participants were explicitly instructed to report the action depicted in the scenes, a response was considered correct only if the action depicted was accurately described. An incorrect response, for example, would be describing the scene of a woman ‘‘playing” the violin with a hammer as ‘‘A lady with black hair” or the scene of a woman ‘‘bowling” with a lettuce as ‘‘A woman in background”. We also calculated mean percentage correct for report of the object used to perform the action. 3.2.1. Scene descriptions 3.2.1.1. 100 ms scene presentation. With 100 ms scene exposure, participants described the action performed in the incongruent version of the scene correctly on 3.7 out of 6 weird scene presentations (62%). Of those who also mentioned the object used to perform the action, which occurred 47% of the time, i.e. for 2.8 of 6 scenes, they were far more likely to describe the object as one that was congruent with the gist, and did so on 94% of these trials, than as the incongruent object that had actually been present, which was correctly identified on only 0.22 of the 6 scenes on average (3%). In other words, when they did mention an object, they rarely reported the incongruent object that had been present but rather tended to normalize the object so that it fit with the gist of the scene (e.g. the man playing the ‘‘broom” was a man playing a ‘‘guitar”). On the other 3% of trials, they simply incorrectly identified the object, e.g. they described the scene of a woman ‘‘looking through” a wood log (as opposed to its congruent counterpart in which she is looking through a telescope), as ‘‘a woman looking through the window”. Here the object is incorrect and not consistent with the gist of the scene, namely a woman looking through a telescope.

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

95

In contrast, participants described the action performed in the congruent version of the scene correctly on an average of 4.8 out of 6 scenes (80%), mentioned an object in 3.4 out of 6 scene descriptions on average, and described the object in the scene correctly on an average of 3 scenes. 3.2.1.2. 200 ms scene presentation. Even with a 200 ms scene exposure and with participants’ attention directed to the object of the action in the scenes, they mostly continued not to report the incongruity in the scenes. While they generally correctly described the action performed in the incongruent version of the scene (an average of 5.2 out of 6 weird scenes presentations, or 87%, compared to 62% in the 100 ms scene presentation condition), and mentioned an object on an average of 4 out of 6 scenes, they again rarely reported the incongruent object and did so only on 0.011 out of 6 scenes on average (1%). Again, as with the 100 ms condition, they tended to normalize the object (96% of trials where they mention an object) so that it fit with the gist of the scene. Participants described the action performed in the congruent version of the scene correctly on an average of 5.3 out 6 congruent scene presentations (88%) and described the object in the scene correctly in 4 out of 6 (see Figs. 4 and 5). Importantly, across all participants, incongruent scenes were shown a total of 120 times (20 participants x 6 weird scene presentations) but scene incongruity was only described 3 times out of the 120 scene presentations. Two of these descriptions were of the scene of the girl licking a light-bulb, and the other was of a scene of a man playing hockey with a shovel. We used two separate mixed-design ANOVAs to analyze these data. The first tested whether participants were better at correctly reporting the action being performed in the scene when the scene was congruent compared to when it was incongruent. There was a main effect of congruity, F(1, 18) = 7.57, p = 0.013, partial eta squared = 0.296, showing that participants

7 6

Mean correct

5 4 3 2 1 0 -1

Acon Correct Object Correct Acon Correct Object Correct (100 ms) (100 ms) (200 ms) (200 ms)

Scene Descripons Congruent Scenes

Incongruent Scenes

Fig. 4. Experiment 2: Scene Descriptions. Action Correct: Mean number of trials the action was described correctly and object was described correctly for congruent scenes and incongruent scenes with 100 ms scene presentation and 200 ms scene presentation. Subjects were only asked to describe the action and anything else they saw. Some of them spontaneously described the object (which was incongruent with the gist in the incongruent scene condition) used to perform the action. Object Correct: The percent object correct is for participants who mentioned an object in their description of the action. Did those participants get the object correct? Scene incongruity was reported on only 3 out of 120 trials containing such scenes.

Mean Correct responses

7 6 5 4 3 2 1 0 Weird

Not weird

100 ms presenta

n

Weird

Not weird

200 ms presenta on

Fig. 5. Experiment 2: Mean correct responses to classify a scene as ‘‘Weird” or ‘‘Not weird”. Twelve scene were shown to each participant, half of which (6 scenes) contained scene incongruity, e.g. a hockey player playing hockey with a shovel and half (6 scenes) with no scene incongruity, e.g. the hockey player playing hockey with a hockey stick. Scenes were presented for 100 ms to one group and 200 ms to the other group. Each participant saw only one version of each scene. Error bars show /+1SD.

96

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

were better at correctly reporting the action when the scenes did not contain an incongruent action and object relationship than when they did. We discuss this below. There was a main effect of scene presentation time, F(1, 18) = 7.35, p = 0.014, partial eta squared = 0.290, showing that, overall, participants correctly reported the action better with a 200 ms scene exposure than with a 100 ms scene exposure. However, there was no significant interaction, F(1, 18) = 3.61, p = 0.074, partial eta squared = 0.167, indicating that participants were not more likely to report the incongruity with a 200 ms scene exposure compared to a 100 ms scene exposure. A second mixed design ANOVA tested whether participants were better at reporting the object in the congruent scene compared to the incongruent scene. There was a main effect of congruency condition, F(1, 18) = 121.37, p < 0.000, partial eta squared = 0.871 showing that participants were significantly better at correctly reporting the object used to perform the action in the congruent scene than in the incongruent scene. There was no main effect of scene presentation time, F (1, 18) = 2.02, p = 0.172, partial eta squared = 0.101, revealing that adding another 100 ms to the scene presentation time did not afford more correct identifications of the object. Finally, there was no significant interaction, F(1, 18) = 2.3, p = 0.143, partial eta squared = 0.115, indicating that there was no difference in identification of the object in the incongruent scene between the 100 ms scene exposure condition and the 200 ms scene exposure condition. 3.2.2. Weird or not weird 3.2.2.1. 100 ms scene presentation (masked). Participants correctly classified the scenes as ‘weird’ on an average of 35% of trials (SD = 21.2), that is they correctly identified scene incongruity in 2.1 out of 6 scenes. In contrast, they were able to correctly classify the scenes as normal on average of 95% of trials (SD = 11.47) (in 5.7 out of 6 scenes). 3.2.2.2. 200 ms scene presentation (masked). Participants correctly reported the scene as weird on an average of 2.5 out of 6 scenes (42% correct; SD = 27.5), while they correctly reported the scene as being normal on an average of 5.5 out of 6 scenes (92% correct; SD = 12.35) (see Fig. 5). A mixed-design 2  2 (2: scene congruity  2: scene presentation time) analysis of variance (ANOVA) was conducted on the data. There was a significant effect of congruity, F(1, 18) = 81.755, p < 0.000, partial eta squared = 0.82, revealing that participants were able to identify congruent scenes as ‘‘not weird” better than they were able to identify incongruent scenes as ‘‘weird”. There was no significant effect of scene presentation time, F(1, 18) = 0.300, p = 0.59, partial eta squared = 0.16, showing that, overall, participants did not perform better in the 200 ms scene presentation condition than in the 100 ms scene presentation condition. Finally, there was no significant interaction between the two independent variables (scene congruity and scene presentation time) on report of congruity, F(1, 18) = 0.635, p = 0.436, partial eta squared = 0.034, indicating that participants were no better at reporting incongruity when the scenes were present for 200 ms than when they were only present for 100 ms. With both exposure times participants were not significantly above or below chance level in reporting weirdness (100 ms condition, t(9) = 2.23, p = 0.05; 200 ms condition, t(9) = 0.81, p = 0.44; chance would be 50% correct), which is consistent with the full attention condition in Experiment 1, testifying to the difficulty in picking up incongruity even when they are looking for it. Contrast this with participants above chance performance in reporting ‘‘not weird” (100 ms condition, t(9) = 12.37, p < 0.0001; 200 ms condition, t(9) = 10.85, p < 0.0001). 3.3. Discussion The results from the second experiment again clearly show that participants are not able to pick up the incongruent action-object relationship in the Mudrik scenes even when they are looking for it. In the first part of the experiment, ‘scene descriptions’, participants are not able to describe the incongruity in the scenes when they are shown for either 100 or 200 ms. The fact that scene incongruity was reported only 3 times out of 120 trials containing an improbable scene supports the idea that perception of the scenes is affected by expectations of what should be present in the scene (e.g. an ice-cream being licked rather than a light-bulb being licked). This validates Bruner and Postman’s (1949) notion of a ‘‘perceptual denial” of violation of expectation as well as Greene et al.’s (2015) conclusion that ‘‘we see what we expect” to see. However, the fact that participants were significantly better at reporting the action in the congruent scene condition compared to the incongruent scene condition does suggest that they were likely to have been picking up the incongruent scene and object relationship at some unconscious level, as the only difference between the two conditions that could account for the significant difference here is that in one condition, the action and object are congruent and in the other, they are not. We discuss this further in the General Discussion in light of the results of all experiments reported in this paper. In the second part of the experiment, ‘weird or not weird’, again we find no evidence that scene incongruity is picked up, which corroborates the findings in Experiment 1. Even when participants are only required to report whether a scene is improbable or not, and even when they are told beforehand that some scenes will be ‘‘weird, odd, or unusual” and given examples of such ‘weird’ scenes, they are not able to reliably detect the incongruity in the scene. This is so regardless of whether the scene is shown for 100 ms or 200 ms. Finally, our failure to find reports of incongruity is especially surprising given that the scenes in this Experiment 2 were shown twice to each participant: once in the ‘scene description’ part of the experiment and again in the ‘weird or not weird’ part of the experiment. It is important to note that the reports of Experiment 2 are consistent with the findings of Greene et al. (2015) referred to earlier. They too found that participants wrote poorer descriptions of improbable scenes and had difficulty classifying them as unusual, even when the scenes were visible for 506 ms.

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

97

In the next experiment we looked at change detection when the object that changes is congruent or incongruent with the gist of the scene. As pointed out earlier, we found it necessary to create our own scenes rather than use the Mudrik scenes because in her scenes the incongruent as well as the congruent object was invariably at the center of the scene making change trivially easy to detect. In all our scenes the target object, whether consistent or inconsistent was never centrally located nor were our scenes of a single person performing an action with either a congruent or incongruent object. Rather our scenes were of various indoor and outdoor environments. Participants had to report the change as quickly as possible and then locate it and identify it, thus enabling us to compare our results with the study of change detection referred to earlier which found differences between the report of changes to inconsistent and consistent objects when the change had to be identified compared with when it merely had to be detected (Moors et al., 2016). 4. Experiment 3: Scene incongruity and change detection 4.1. Methods and materials 4.1.1. Participants 20 New School students participated in the study for research credit. All participants were over the age of 18, with normal or corrected-to-normal vision. All participants provided signed consent. 4.1.2. Equipment Participants were seated in a dimly lit room, approximately 76 cm away from the computer screen and their heads were stabilized with a chin rest. The experiment was programed using SuperLab5 and run on a 1.83 GHz Intel Core Duo Mac Mini computer. Stimuli were presented on a 27 in. ASUS VG278HE monitor (resolution 1080  920 at 60 Hz). 4.1.3. Stimuli Images subtended 23 (width) by 15 (height) degrees of visual angle. 20 images depicting natural scenes (e.g. bathroom, florist, hair salon, etc.) were selected from Google images database and modified using Adobe Photoshop.1 Modification of the scenes consisted of addition of an object that was either congruent or incongruent with the gist of the scene. First, pairs of objects were selected based on visual similarity (e.g. a basketball and a pumpkin). Then, these objects were paired with two scenes so that an object congruent in one scene (e.g. a basketball in a basketball game scene) was incongruent in another paired scene (e.g. a basketball in a pumpkin field), while a second object that is incongruent with the first scene (e.g. a pumpkin in a basketball game scene) would be congruent with the second one (e.g. a pumpkin in a pumpkin field) (see Fig. 6). This method of pairing allowed us to have two versions of each scene (one with a congruent object that would appear and disappear and one with an incongruent object that did so). Thus each target object appeared in both a congruent and incongruent setting. In each scene, the added congruent and incongruent objects matched each other in size and always appeared in the exact same location (see Fig. 6). An independent group of participants (n = 27) rated how much each object belonged to the given scene using a 5-point Likert scale (1: does not belong at all, 5: completely belongs). For congruent object/scene pairings, the average rating was 4.7 (ranging between 3.8 and 5), while for incongruent pairings the average rating was 1.6 (ranging between 1.1 and 2.9). 4.1.4. Procedure Participants were told that their task was to detect changes in briefly presented scenes. They were instructed to freely view the images and press the space bar as soon as they detected the change between two very similar versions of the same scene. They were told that there would be a change on every trial and that the change they were looking for was either an addition or deletion of an object. Two practice trials (taken from the stimuli set of the original Rensink et al., 1997 study) preceded the experimental block. All trials began with a 1500 ms fixation mark. Each participant viewed 20 images in random order. Images were presented using the original Rensink et al. (1997) flicker paradigm. The original images (A) and modified versions (A0 ) alternated with blank gray screens. Each image was presented for 240 ms and the blank screens were presented for 80 ms. The flicker sequence was A, A, A0 , A0 and the cycle continued for 60 s or until the participant pressed a key to report the change. Reaction times were recorded. Upon pressing space bar, a box appeared on the screen and participants used the keyboard to describe the change they saw and also indicate the location of the change. Of the 20 scenes participants viewed, 10 contained a congruent change and 10 contained an incongruent change. Participants saw only one version of the scene (either with a congruent or incongruent change) and the versions were counterbalanced across two groups of participants to ensure that each scene and object were viewed only once per participant. At the end of the experiment, participants were queried about whether they had noticed anything about the nature of the changes. All 20 participants reported being aware that some of the objects that were changing didn’t fit with the scene.

1

The authors would like to thank Mary Adams for her help in creating the stimuli for experiment 3 and Cooper Naess for help with running participants.

98

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

Fig. 6. Examples of scenes used in Experiment 3. Scene (a) basketball game with congruent object (basketball) circled; (b) basketball game with incongruent object (pumpkin) circled; (c) pumpkin field with congruent object (pumpkin) circled; and (d) pumpkin field with incongruent object (basketball) circled. All images are smaller than they appeared on the computer screen. (See https://mackperceptionlab.wordpress.com for full sized images).

4.2. Results Four trials, 2 involving congruent and 2 with incongruent scenes, were excluded from the analyses because the change that was reported is not one that occurred. Participants otherwise did well in detecting change. Across 20 participants (400 trials total), only 17 changes were not detected within 60 s (11 congruent, 6 incongruent). In order to be able to include these cases in our analyses, a standard reaction time of 60 s was used to replace missing responses, i.e. cases in which 60 s elapsed without a response being made. Mean reaction times in both conditions were then computed for each participant. A dependent samples t-test was conducted to compare reaction times to detecting changes in congruent as compared to incongruent scenes. As others had already found (Hollingworth & Henderson, 2000), incongruent changes (M = 13.9 s, SD = 4.1 s) were detected significantly faster than congruent changes (M = 11.7 s, SD = 4.2 s), t(19) = 2.448, p = 0.024. Thus like Hollingworth and Henderson (2000) we too found that changes that are semantically inconsistent with a scene were detected faster compared to the comparable consistent ones (see Fig. 7). This was not the case for the identification of the changes. Participants did very well on congruent trials. Of the 187 out of 200 changes that were detected, all but one was located and identified correctly. On the incongruent trials, however, although more changes were detected (192 out of 200 changes), participants made more errors in identification of the changing objects. 161 changes (84%) were localized and identified correctly. 8 (4%) changes were localized correctly but described incompletely (e.g. ‘‘something in the top corner of the table”) and 23 changes (12%) were described in a way that indicated that scene gist was dictating the identity of the object that changed, what we referred to earlier as normalizing. In these cases, for example, a basketball that was appearing and disappearing in a pumpkin field was described as another pumpkin or, in another case, a sandcastle appearing and disappearing in a snowy yard was described as a yellow bush. It makes sense to look at these trials separately as they seem to be yet another example of what Bruner and Postman (1949) referred to as a dominance response. These are instances in which the gist of the scene overrides the identity of the incongruent changing object leading the participant to report that a scene consistent object was undergoing change when in fact it was a scene inconsistent object that was appearing and disappearing. Although it is not possible to do a participant by participant analysis for these cases (as not all participants made these mistakes), the average reaction time for all fully correct identifications in the incongruent condition, that is when participants reported the change of the incongruent object, was about 1.5 s faster than reports of incorrectly identified (‘normalized’) reports of change to the incongruent object.

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

99

Fig. 7. Reaction time to detect change in the congruent scene and incongruent. Error bars show /+1 SEM.

When the change is ‘normalized’ there seems to be some hesitancy in responding, which might have been occasioned by some sense of incongruity. 4.3. Discussion Participants’ reaction times for change detection was faster in the incongruent condition than in the congruent condition, which, assuming the necessity of attention for change detection, shows attention is playing some role. However, our results (and the results of the other experiments presented here) do not support a theory in which attention is captured in a brief visual presentation by a scene incongruent object, but rather they suggest that in the incongruent scene, once attention has alighted upon the incongruent object, it is held for longer than it would be by its counterpart object in the other version of the scene. The scene changes while attention is on the incongruent object and the participant detects the change. In this experiment, the fact that participants are not detecting the change until, on average, 11 s after the initial scene presentation onset shows that they are not detecting it on the first presentation or even in the first second when the scene with the incongruent object would have been presented twice (see Footnote 1). Instead, we propose that it is not that attention is initially captured by incongruity, but that once attention lands on an incongruous object, it stays there longer, and the change is detected because of this. This is partly in keeping with the results of Mudrik, Deouell et al. (2011), who found that under conditions of binocular rivalry between incongruent and congruent versions of scenes, incongruent scenes do not emerge into consciousness faster but do remain there longer than congruent ones. In the next experiment, we asked whether iconic memory, the earliest stage of visual information processing, is affected by the presence of scene incongruity. We hypothesized that if attention is attracted to scene incongruity (perhaps as a result of a conflict between scene context and object processing in the visual system, which can only be resolved by conscious investigation of the scene), then the presence of a scene containing some semantic violation (e.g. a man ‘shaving’ with a fork or a woman ‘playing’ a violin with a hammer) should attract attention, thus leaving fewer attentional and cognitive resources to process the other scenes in a four-scene display. This would be evidenced by fewer reports of the gist of the other scenes on trials when incongruent scenes are part of the 4-scene array compared to trials containing the congruent version of the scene (e.g. a man shaving with a razor or a woman playing a violin with a bow). In other words, if scene incongruity captures attention, then the presence of a scene with some incongruity should interfere with report of the other scenes in the display. 5. Experiment 4: Scene incongruity and iconic memory 5.1. Methods and materials 5.1.1. Participants Fourteen participants from The New School community were tested (age range 22–27). All participants reported normal or corrected-to-normal vision. All participants were given course credit for participating.

100

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

5.1.2. Equipment The experiment was programmed and run on SuperLab 5.0. Stimuli were presented on a 1.83 GHz Intel Core Duo Mac Mini and a DELL M782 monitor set at 1152  864 resolution, with a refresh rate of 75 Hz. 5.1.3. Stimuli Stimuli were 160 photographs of real-world or natural scenes from many categories, e.g. outdoor scenes, weddings, rooms, animals, and cityscapes, theaters, museums. The scenes were found using Google Images. To ensure the scenes had agreed upon gist, a group of 25 naïve participants were shown each scene and asked to give a one-word descriptor, e.g. airport. A scene was only chosen to be a target scene (40 scenes) if at least 70% of the participants described it with the same one-word descriptor. In addition, 20 scenes with congruent objects and 20 counterpart scenes with incongruent objects were chosen from the same database as the scenes used in Experiment 1 and 2 here, and in previous studies (e.g. Mudrik, Deouell et al., 2011). Each of these scenes depicted a person performing some action involving an object, e.g. a basketball player throwing a basketball. In the incongruent version of the scene, an object that was not related to the scene replaced the original object (e.g. a watermelon replaced a basketball in a basketball game). Digital manipulation was equated by replacing the object in the congruent version of the scene with another exemplar of the same object (Mudrik, Deouell et al., 2011). Furthermore, brightness and contrast were equated using Adobe Photoshop, and saliency maps were compared using the Itti, Koch, and Niebur (1998) algorithm. No differences were found between the congruent and the incongruent versions (Mudrik, Deouell et al., 2011). All scenes subtended a visual angle of 6  5 deg at a viewing distance of 56 cm. A single display consisted of 4 scenes centered around a small fixation cross at the center of the screen. The center of each scene was 5.3 deg from fixation, while the corner nearest to fixation was 1.4 deg away. Scenes were presented on a white background. 5.1.4. Procedure Following 15 practice trials (in which no scene incongruent stimuli were shown), participants were presented with 40 randomized trials. On each trial, following the fixation cross (1500 ms), an array of four scenes was presented for 500 ms. Scenes could come from any of the scene categories. Following the 4-scene display, a cue (a red line subtending 0.5  6 deg of visual angle, randomly placed either below one of the two scenes in the lower part of the display or above one of the two scenes in the upper part of the display) appeared for 200 ms. Subsequently, a word, e.g. ‘‘GARDEN” appeared at the center of the screen. The participant’s task was to report whether the word matched the cued scene by pressing the ‘Y’ key for ‘‘yes” and the ‘‘N” key for ‘‘no”. Each participant saw half the trials with a word that matched the cued scene and half with a word that did not match any of the scenes in the array. On half of the arrays, a scene with a context/object violation was presented (e.g. a clown with a machine gun), while on the other half, a scene without a context/object violation was presented (e.g. a clown with a bunch of plastic flowers). Participants never saw both versions of the same scene (the scene violation and its non-violation counterpart). Half of the trials contained scenes with scene incongruity and half of the trials contained their counterparts (scenes without any violation). After the 40th and final trial, participants were asked whether they had noticed any ‘‘odd or unusual-looking scenes” during the experiment and, if so, to describe what they had seen. Importantly, the scenes with semantic violations were never cued. As stated above, we were interested in whether the presence of a scene containing an object/context violation would interfere with processing of the target scene, supposedly by capturing attention leading to fewer attentional resources being available for the cued scene. The results of other studies (e.g. Mudrik, Deouell et al., 2011) in which scenes with violations broke through CFS into awareness faster than normal scenes and hindered processing of liminal scenes led us to predict that if iconic memory is sensitive to scene incongruity, such effects might be found under the conditions in our experiment. If scenes containing a semantic violation are attracting attention and consequently leading to less sensitivity to cued target scenes, then we predicted that d’ scores (d’ is a psychophysical measure of perceptual sensitivity) for the target scenes would be significantly lower when there is a semantic violation present in the array than when there is no such scene violation. 5.2. Results A paired-samples t-test was conducted to compare d’ scores to scene gist in the scene incongruent and scene congruent conditions. There was no significant difference in the mean scores between the scene incongruent condition (Mean d0 = 1.2; SD = 0.56) and the scene congruent condition (Mean d0 = 1.4; SD = 0.74), t(13) = 0.60, p = 0.55 (see Fig. 8). Only 1 out of the 14 participants (7%) reported being aware of ‘‘anything odd or unusual” about the content of any of the scenes when asked after the 40th and final trial, despite the fact that half of the experimental trials (20) contained a scene with an incongruent context/object relationship and despite the fact that arrays were shown for 500 ms and were not masked. 5.3. Discussion We predicted that if scene incongruity is available in iconic memory, this mismatch between object and scene context or gist would lead to attentional resources being captured by the scene, which in turn would lead to fewer resources being

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

101

2.5

Mean d'

2 1.5 1 0.5 0

Congruent

Incongruent

Scene Condition Fig. 8. Experiment 4: Mean d’ scores for report of scene gist for trials containing only congruent scenes and trials containing one scene with incongruity. Error bars show +/1 SD.

available to encode, process, and report the cued scene. This, in part, is predicted by recent studies (e.g. Mudrik, Deouell et al., 2011). However, performance was not worse on trials with incongruent scenes compared to trials with congruent scenes. Furthermore, and surprisingly, only 1 out of the 14 participants reported seeing any odd scene during the experiment, despite the 4-scene array being shown for 500 ms. This is, of course, consistent with the results of Experiment 1 in which the scenes were chosen from the same Mudrik archive. Participants here too failed to report scene incongruity even with full attention. 6. General discussion Two of the procedures, an inattentional blindness procedure (Experiment 1) and an iconic memory partial-report procedure (Experiment 4), used to investigate whether incongruity within a scene is picked up outside of awareness and captures attention, failed to yield any evidence that it did. Moreover, only 1 of the 54 participants who participated in these two experiments (Experiment 1 and 4) reported scene incongruity even when they were attending to the scenes in which it was present. In fact, when an incongruity was present and participants were asked to describe what they had seen, it seems that in almost all cases the expectation associated with the gist of the scene not only led to overlooking the incongruent object but to normalizing it so that it was consistent with scene gist. Thus, contrary to the prediction based on the work of Mudrik and colleagues, not only did the incongruent object fail to capture attention, it lost its incongruent character, becoming the very object it was replacing when the scene was turned from a congruent into an incongruent one, a phenomenon we found in all the experiments. For example in Experiment 1, in the scene in which a girl was licking a light bulb, the action was described as a girl licking an ice cream cone. A very similar effect was described long ago in the Bruner and Postman (1949) paper cited in the introduction. In their experiments, participants were asked to identify tachistiscopicallypresented normal and doctored playing cards. Exposure times were increased from 10 to 1000 ms or until the participant responded. Their general finding was that ‘‘The recognition threshold for incongruous playing cards (those with suit and color reversed) is significantly higher than the threshold for normal cards” (Bruner & Postman, 1949, p. 210). A version of these findings, namely that objects that are congruent with a scene are recognized faster and better than their incongruent versions, have been replicated many times over since then (e.g. Biederman, 1972; Davenport & Potter, 2004; Greene et al., 2015; Palmer, 1975) but for our purposes it is Bruner and Postman’s description of a subset of the participants’ responses to incongruent cards that is of particular interest (1949). They describe some of the responses as, ‘dominance reactions’, which consisted essentially of a ‘‘perceptual denial” of the incongruous elements in the stimulus pattern such that the reported perception conforms to the expectations about normal playing cards. This seems to perfectly describe the responses of most of our participants in the first two experiments to congruent and incongruent scenes. Not only did we find no difference in IB between them, but we found that even in the full attention condition, participants did not pick up the incongruity, but rather engaged in what is perfectly described as ‘perceptual denial’, e.g. describing a girl licking a light bulb as a girl eating an ice cream cone. The results of Experiments 1, 2 and 4 looking at the perceptual effects of incongruity between the gist of a scene and an object failed to provide any evidence that incongruity either reduces IB or disrupts reports from iconic memory because it demands attention. We thus failed to confirm the prediction based on the work of Mudrik and colleagues which indicated that this would be the case. The results of these experiments are consistent with the findings of other researchers who failed to find an effect of incongruity (e.g. Greene et al., 2015). The results of the experiment looking at change detection and incongruity (Experiment 3) on the other hand produced quite opposite results. In this experiment we found that detection of change was faster when the object that cyclically appeared and disappeared was incongruent with scene gist rather than congruent with it, which is consistent with what

102

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

others have also found (Hollingworth & Henderson, 2000), although LaPointe et al. (2013) only found this to be true for the detection or localization of change. In the LaPointe experiment, when participants were required to identify the change, the results reversed. Although incongruent changes were detected more quickly than congruent ones, they were often misidentified with the object that was reported to have appeared and disappeared identified as a normalized version of the incongruent object. What might account for the difference between inattentional blindness and iconic memory on the one hand and change detection on the other? One clear procedural difference between the change detection study and the iconic memory and inattentional blindness experiments is that the change detection study entailed repeated exposures of the scene with and without the target object, whereas in the other experiments, scenes were only presented once and briefly (between 100 and 500 ms). It may be that incongruity is not picked up when a scene is presented once very briefly, which would explain why even with full attention in the IB experiment participants do not pick up the incongruity and why in Experiment 2 not only do participants fail to describe the incongruity when their only task is to describe a scene flashed for 100 or 200 ms, but they fail to do so even when the scenes are shown a second time and they are asked only to report whether it is normal or weird. In other words, they fail to pick up scene incongruity even when their only task is to report it. This explanation would be consistent with the conjecture that ‘‘if consistent information is most critical the first time a scene is processed, the repeated viewing of objects and scenes in various combinations would reduce the consistency effect.” (Davenport & Potter, 2004, p. 563). It should be noted, however, that Mudrik and Koch (2013) found an effect of subliminally presented weird scenes presented only once on subsequent reports of whether a liminal scene was either weird or not, which would seem to be at odds with this explanation. It may be that the gist dominates the perception of a scene on first viewing so that an incongruent object is simply overlooked, whereas over time as participants scan the repeatedly presented scenes for a change, they may happen to find the gist inconsistent object which, because it is inconsistent, holds attention for longer, leading to faster change detection. In contrast, if in the congruent scene the comparable object happens to be looked at, it does not require increased attention, thus possibly failing to hold attention making its change harder to find. It is not that the incongruent object jumps out on first viewing, capturing attention, but if it happens to be looked at, is likely to hold attention for longer making its change easier to detect. We find evidence for this in the change detection experiment as only 1 participant detected change in the incongruent scene condition within 2 s (at 1698 ms, when the scene had been presented 6 times), with most changes being detected after around 11 s on average. If scene incongruity grabs attention in a single scene presentation, then change detection should have occurred within the first presentation or, at most, the first few presentations of the scene. The fact that it does not suggests that rather than capturing attention, an incongruous object in a scene holds attention for longer, and while attention is being held by the object, the change occurs and participants detect the change. When the object is congruent with the scene, the object does not hold attention. This fits with the Mudrik, Deouell et al. (2011) finding that, while the incongruent object does not capture attention faster than the congruent object when the two are pitted against each other under conditions of binocular rivalry, the incongruent object dominates the normal object for a longer interval. It does not fit, however, with the results of their other study using CFS, as the binocular rivalry inducing procedure. Mudrik, Breska, Lamy, and Deouell (2011) reported that scenes with incongruent objects in fact emerged into consciousness faster than congruent versions of the same scenes. It is not clear how these two apparently different findings fit together. In Experiment 2, we did find that participants were better at reporting the action in congruent scenes than incongruent scenes. This does suggest that at some level, the visual system is sensitive to scene incongruity as the only difference between the congruent and incongruent versions of the scenes was the object used to perform the action. However, participants only reported scene incongruity in 3 out of 120 trials on which an incongruent scene was presented. Furthermore, the evidence from all experiments reported here clearly shows that even if incongruity is picked up unconsciously, it is not picked up sufficiently for it to be reported, nor to afford identification of the scene as being ‘‘weird”, or to attract attention. Overall the results of these 4 experiments provide no evidence scene incongruity captures attention outside of awareness and thus emerges more quickly into consciousness. On the contrary, they suggest the opposite. Expectation based on scene gist dominates and seems to determine initial scene perception and overrides any object incongruity so that the incongruent object is either ignored or perceived as the object that should be there based on scene gist, e.g., a basketball on a basketball court and not a watermelon, or an ice cream cone being licked and not a light bulb. This is, of course, consistent with all the by now overwhelming evidence (some but not all of which is referenced in this paper) that objects in incongruent settings take longer to identify and are identified less accurately than their congruent counterparts. Expectations, as Bruner and Postman (1949; see also Greene et al., 2015) so long ago found, (which are generally based on our past experiences), dominate our perceptions and may lead us to misperceive a scene incongruent object as a scene congruent one, a phenomenon they referred to as the dominance response. It is expectation that leads us to detect semantic incongruity in scenes but it is also expectation which more often than not dominates our perceptions so that we end up seeing something other than what is in fact there to be seen. References Baars, B. J. (2002). The conscious access hypothesis: Origins and recent evidence. Trends in Cognitive Sciences, 6(1), 47–52. Biederman, I. (1972). Perceiving real-world scenes. Science, 177(43), 77–80.

A. Mack et al. / Consciousness and Cognition 48 (2017) 87–103

103

Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2), 143–177. Bruner, J. S., & Postman, L. (1949). On the perception of incongruity: A paradigm. Journal of Personality, 18(2), 206–223. Clarke, J., & Mack, A. (2014). Iconic memory for the gist of natural scenes. Consciousness and Cognition, 30, 256–265. Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background perception. Psychological Science, 15(8), 559–564. Dehaene, S. (2014). Consciousness and the brain: Deciphering how the brain codes our thoughts. NY: Viking Press. Greene, M. R., Botros, A. P., Beck, D. M., & Fei-Fei, L. (2015). What you see is what you expect: Rapid scene understanding benefits from prior experience. Attention, Perception, & Psychophysics, 77(4), 1239–1251. Henderson, J. M., & Hollingworth, A. (1998). Eye movements during scene viewing: An overview. Eye Guidance in Reading and Scene Perception, 11, 269–293. Hollingworth, A., & Henderson, J. M. (2000). Semantic informativeness mediates the detection of changes in natural scenes. Visual Cognition, 7(1/2/3), 213–235. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. LaPointe, M. R., Lupianez, J., & Milliken, B. (2013). Context congruency effects in change detection: Opposing effects on detection and identification. Visual Cognition, 21(1), 99–122. Mack, A., & Clarke, J. (2012). Gist perception requires attention. Visual Cognition, 20(3), 300–327. Mack, A., & Rock, I. (1998). Inattentional blindness (Vol. 33) Cambridge, MA: MIT Press. Moors, P., Boelens, D., van Overwalle, J., & Wagemans, J. (2016). Scene integration without awareness no conclusive evidence for processing scene congruency during continuous flash suppression. Psychological Science, 5(5). 0956797616642525. Mudrik, L., Breska, A., Lamy, D., & Deouell, L. Y. (2011). Integration without awareness expanding the limits of unconscious processing. Psychological Science, 22(6), 764–770. Mudrik, L., Deouell, L. Y., & Lamy, D. (2011). Scene congruency biases binocular rivalry. Consciousness and Cognition, 20(3), 756–767. Mudrik, L., Faivre, N., & Koch, C. (2014). Information integration without awareness. Trends in Cognitive Sciences, 18(9), 488–496. Mudrik, L., & Koch, C. (2013). Differential processing of invisible congruent and incongruent scenes: A case for unconscious integration. Journal of Vision, 13 (13), 1–14. Mudrik, L., Lamy, D., & Deouell, L. Y. (2010). ERP evidence for context congruity effects during simultaneous object–scene processing. Neuropsychologia, 48 (2), 507–517. Palmer, T. E. (1975). The effects of contextual scenes on the identification of objects. Memory & Cognition, 3, 519–526. Pinto, Y., van Gaal, S., de Lange, F. P., Lamme, V. A., & Seth, A. K. (2015). Expectations accelerate entry of visual stimuli into awareness. Journal of Vision, 15(8), 1–15. Rémy, F., Vayssière, N., Pins, D., Boucart, M., & Fabre-Thorpe, M. (2014). Incongruent object/context relationships in visual scenes: Where are they processed in the brain? Brain and Cognition, 84(1), 34–43. Rensink, R. A., O’Regan, J. K., & Clark, J. J. (1997). To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8(5), 368–373. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs: General and Applied, 74, 1–29. Treisman, A. (2003). Consciousness and perceptual binding. In Axel Cleeremans (Ed.), The unity of consciousness (pp. 95–113). Oxford University Press.