Vision Research 133 (2017) 161–175
Contents lists available at ScienceDirect
Vision Research journal homepage: www.elsevier.com/locate/visres
Attention in natural scenes: Affective-motivational factors guide gaze independently of visual salience Judith Schomaker a,⇑, Daniel Walper b, Bianca C. Wittmann a, Wolfgang Einhäuser b a b
Justus Liebig University Giessen, Department of Psychology and Sports Science, Germany Chemnitz University of Technology, Institute of Physics, Physics of Cognition, Germany
a r t i c l e
i n f o
Article history: Received 10 November 2016 Received in revised form 13 February 2017 Accepted 22 February 2017
Keywords: Attention Natural scenes Free viewing Valence Motivation Visual salience
a b s t r a c t In addition to low-level stimulus characteristics and current goals, our previous experience with stimuli can also guide attentional deployment. It remains unclear, however, if such effects act independently or whether they interact in guiding attention. In the current study, we presented natural scenes including every-day objects that differed in affective-motivational impact. In the first free-viewing experiment, we presented visually-matched triads of scenes in which one critical object was replaced that varied mainly in terms of motivational value, but also in terms of valence and arousal, as confirmed by ratings by a large set of observers. Treating motivation as a categorical factor, we found that it affected gaze. A linear-effect model showed that arousal, valence, and motivation predicted fixations above and beyond visual characteristics, like object size, eccentricity, or visual salience. In a second experiment, we experimentally investigated whether the effects of emotion and motivation could be modulated by visual salience. In a medium-salience condition, we presented the same unmodified scenes as in the first experiment. In a high-salience condition, we retained the saturation of the critical object in the scene, and decreased the saturation of the background, and in a low-salience condition, we desaturated the critical object while retaining the original saturation of the background. We found that highly salient objects guided gaze, but still found additional additive effects of arousal, valence and motivation, confirming that higher-level factors can also guide attention, as measured by fixations towards objects in natural scenes. Ó 2017 Elsevier Ltd. All rights reserved.
1. Introduction Attention is the mechanism by which information is selected for further processing. Influential theories of attention have made a distinction between attentional guidance by bottom-up factors, such as physical stimulus characteristics, and top-down factors, like current task goals (Awh, Belopolsky, & Theeuwes, 2012; Theeuwes, 1991, 1992, 2010). According to a traditional model of saliency, physical characteristics, including local differences (‘‘contrasts”) in intensity, colour, and orientation, determine the visual salience of a stimulus relative to its background (Koch & Ullman, 1985). The successful application of this ‘‘saliency map model”, which was originally designed for covert attention towards simple stimuli, to overt attention in natural scene viewing (Itti, Koch, & Niebur, 1998) started a branch of computational modelling aimed at understanding which factors influence allocation of gaze in nat⇑ Corresponding author. E-mail address:
[email protected] (J. Schomaker). http://dx.doi.org/10.1016/j.visres.2017.02.003 0042-6989/Ó 2017 Elsevier Ltd. All rights reserved.
ural scenes (see Borji, Sihite, & Itti, 2013, and Kummerer, Wallis, & Bethge, 2015 for recent reviews and model comparisons). Many of these models have in common that they emphasize the role of bottom-up (stimulus-driven) processes based on low-level stimulus features, although it has become evident that the higher-level scene structure, including objects (Einhauser, Spain, & Perona, 2008; Nuthmann & Henderson, 2010; Pajak & Nuthmann, 2013; Stoll, Thrun, Nuthmann, & Einhauser, 2015), dominates over lowlevel features and spatial priors and top-down biases make a substantial contribution to gaze guidance (Clarke & Tatler, 2014; Tatler, 2007; Torralba, Oliva, Castelhano, & Henderson, 2006). Maybe most importantly, low-level salience accounts fail to explain attentional allocation in natural scenes once a task is involved (Einhauser, Rutishauser, & Koch, 2008; Henderson, Brockmole, Castelhano, & Mack, 2007), except for a short period after stimulus onset when bottom-up processes may guide the first and fast saccades (Anderson, Ort, Kruijne, Meeter, & Donk, 2015). Since there are many ways to quantify salience, for the purpose of the present analysis, we picked the model that had been found
162
J. Schomaker et al. / Vision Research 133 (2017) 161–175
to achieve best performance in a recent review paper (Borji, Sihite, & Itti, 2013a), and therefore operationalized salience as the output by the Adaptive Whitening Salience (AWS) model (Garcia-Diaz, Fdez-Vidal, Pardo, & Dosil, 2012; Garcia-Diaz, Leboran, FdezVidal, & Pardo, 2012). Although the model also has some success in extracting (proto-)objects, it is a fully image-computable model without any learnt or in-built semantic knowledge of objects and therefore is representative of a stimulus-driven (sometimes referred to as bottom-up) model of salience. In the present paper, we will use the term ‘‘salience” exclusively for this stimulusdriven property. On top of such bottom-up (stimulus-driven) and top-down (task) factors, previous experience can also guide attention (the latter is sometimes referred to as selection history; Awh et al., 2012). Reward, for example, can enhance processing of relevant stimulus features (Hickey, Chelazzi, & Theeuwes, 2010) and prime processing at reward-associated locations through selective attentional mechanisms (Hickey, Chelazzi, & Theeuwes, 2014). Stimuli associated with reward can even attract attention when taskirrelevant (Chelazzi, Perlato, Santandrea, & Della Libera, 2013; Failing, Nissens, Pearson, Le Pelley, & Theeuwes, 2015; Failing & Theeuwes, 2014; Hickey & Peelen, 2015; Hickey & van Zoest, 2012; Hickey et al., 2010, 2014; Raymond & O’Brien, 2009). The effects of motivationally relevant stimuli have been suggested to facilitate visual processing via subcortical pathways that connect visual areas in the brain with areas associated with oculomotor control, e.g. the basal ganglia and superior colliculus (Vuilleumier, 2015). A possible reason why such low-level effects of reward are found in the brain may be that studies investigating the effects of reward on attention typically use very simple stimuli or stimulus features (e.g. geometric forms, or the colour red) that are paired with reward. Attentional capture is then observed by those simple stimulus features associated with reward (Vuilleumier, 2015). Although studies investigating effects of motivation on attention using more naturalistic stimuli are scarce, similar effects have been observed for object categories in different types of natural scenes (e.g. people, trees, and cars: Hickey, Kaiser, & Peelen, 2015; Hickey & Peelen, 2015; and for different types of landscapes: Failing & Theeuwes, 2015). In the real world, we already have motivational associations for a large number of complex objects; objects that we have learned to recognize, and through experience have come to associate with reward or punishment. For example, we have learned that a strawberry cake usually smells good, tastes good, and provides nutrition, while a rotten banana smells bad and is no longer tasty. Appetitive-motivational stimuli elicit a drive that promotes approach behaviour; an effect often described as incentive or motivational salience (Berridge, Robinson, & Aldridge, 2009; Zhang, Berridge, Tindell, Smith, & Aldridge, 2009). In contrast, negative motivational value elicits avoidance behaviour (Higgins, 1997; Johnson & Zatorre, 2005; Skinner, 1938; Thorndike, 1911). Exploratory viewing behaviour, as measured using total dwell time, or fixation probability, could be used to quantify attentional allocation to motivational stimuli. Information of what an object looks like, how it feels, how it smells and what it sounds like, is stored in long-term memory (Brady, Konkle, Alvarez, & Oliva, 2008; Brady, Konkle, Oliva, & Alvarez, 2009; Konkle, Brady, Alvarez, & Oliva, 2010), together with information regarding the value of interacting with that specific object. This information could potentially guide attention via an object categorization process (Logan, 2002), similar to selection history effects. In the first experiment of the current study, we aimed to experimentally investigate whether affective-motivational factors can guide attention to complex objects in naturalistic scenes using scenes from the Motivational Objects in Natural Scenes (MONS) data-
base (Schomaker, Rau, Einhaueser, & Wittmann, submitted). A crucial difference with previous studies is that we did not ‘teach’ our subjects which stimuli to associate with reward. We presented a series of naturalistic scenes with everyday objects for which motivational associations had presumably been established through a lifetime of experience. Inter-individual differences exist in the valuation of objects and scenes (Bartra, McGuire, & Kable, 2013; Hare, Camerer, & Rangel, 2009; Stigler, 1950), however, which may contribute to some of the variance in the current database. Nevertheless, the MONS database ratings are based on a large number of different subjects rating the same stimuli, creating a standardized stimulus set. In the current study, participants freely viewed the scenes for 5 s, to eliminate any additional top-down task effects, while we tracked their eye movements. Eye movement measures like time to first fixation, fixation probability, and total dwell time were used to characterize attentional deployment. The same scene was repeated three times in a counterbalanced within-subjects design, but with one object replaced that was intended to vary in motivational value (either aversive, neutral, or appetitive). We will refer to these replaced objects as the ‘critical’ objects in the rest of the paper, as they define the critical difference in motivational value between the repeated scenes. We modelled eye movement behaviour using motivational ratings for objects in scenes from the MONS database (Schomaker, et al., submitted). The scenes were selected to contain objects that varied in motivational value, but also differed in terms of emotional value (i.e. valence) and arousal (ratings derived from the MONS database; Schomaker, et al., submitted). As both valence and arousal may affect attention as well (Lang, Bradley, & Cuthbert, 2008; Russell, 1980), we also included valence and arousal in our models. Although valence and motivation are related, they can be functionally dissociated. Valence relates to the emotional response elicited by a stimulus, while motivation relates more to the drive evoked by the stimulus (Pessoa, 2009). Visual salience may be less of an influence in natural scene viewing compared to more simple stimulus displays, as several studies have shown that attention is more strongly guided by objects (Einhauser et al., 2008; Nuthmann & Henderson, 2010; Pajak & Nuthmann, 2013; Stoll et al., 2015). But also under more complex conditions, luminance contrast manipulations affecting visual salience can guide gaze in naturalistic scenes (’t Hart, Schmidt, Klein-Harmeyer, & Einhauser, 2013). Increasing the visual contrast of an object in a natural scene increased fixations on that object, while changing both the object’s and background contrast had no effect (’t Hart et al., 2013). These findings suggest that also in natural scenes attention can be guided by physical stimulus properties, therefore we also included a measure of visual salience in our model (Garcia-Diaz, Fdez-Vidal, et al., 2012; Garcia-Diaz, Leboran, et al., 2012). The effects of motivation are believed to affect the visual processing of stimuli and through attentional mechanisms mimic the effects of visual salience, as evidenced by event-related potential (Hickey, McDonald, & Theeuwes, 2006) and neuroimaging (Hickey et al., 2010; for a review see Chelazzi et al., 2013) studies. It is still unclear if visual salience also affects processing of emotional or motivational information. On the behavioural level, effects of visual and affective-motivational factors may be expressed in the same way; both can influence overt attention, as quantified by eye movement behaviour. Therefore, in the second experiment we experimentally investigated whether the effects of affective-motivational factors, including motivation, arousal, and valence, can be modulated by visual manipulations of stimulus features. In one condition, we presented the same unmodified scenes as presented in the first experiment again (medium-salience condition). In two additional conditions we either retained the saturation of the critical object in the scene, but decreased the saturation
J. Schomaker et al. / Vision Research 133 (2017) 161–175
of the background (high-salience condition), or the opposite, we desaturated the critical object while retaining the original saturation of the background (low-salience condition). Using this procedure we increased and lowered the relative salience of the critical object in the scenes. Taken together, the current study investigated whether affective-motivational factors can guide attention towards complex objects in natural scenes (experiment 1), and whether such effects interact with salience (or whether the effects of visual and affective-motivational factors are additive; experiment 2). 2. Methods 2.1. Participants Sixteen (age range = 19–31; mean age = 25.6, SD = 3.7; 11 male, 5 female) volunteers participated in experiment 1, and 18 (age range = 21–31; mean age: 24.3, SD = 2.2; 8 male, 10 female) in experiment 2. All observers had normal or corrected-to-normal vision and gave informed consent to participation. One participant in experiment 1 had to be excluded from the analysis due to a technical problem with the eye tracking data. Procedures conformed to the Declaration of Helsinki and were approved by the ethics committee of the Department of Psychology and Sports Science at the Justus Liebig University, Giessen. 2.2. Stimuli All stimuli were taken from the Motivational Objects in Natural Scenes (MONS) database, which is available for download at http://
163
www.allpsych.uni-giessen.de/mons. This database contains triads of scenes. Each triad includes 3 repetitions of the same scene, keeping a high level of visual similarity, but with one object (hereafter: ‘‘critical object”) replaced that varies in motivational value (see middle row of Fig. 1 for an example triad). When designing the database the intention was to create an aversive, neutral, and appetitive motivational version of each scene, hereafter referred to as the ground truth. The database provides measures of motivation for the critical objects in each scene (1. Approach/Avoid: ‘‘Do you want to approach/avoid the object?”; 2. Desire to Own: ‘‘How much would you like to own the object?”; 3. Interaction: ‘‘How much would you like to interact with the object?”; these three motivational scales were combined into one mean motivational measure), and for other affective-motivational factors, including valence (‘‘Does the object elicit positive or negative emotions?”), and arousal (‘‘Does this object make you calm or aroused?”). The database also provides affective and motivational measures for the scenes as a whole, and for all the objects in the scenes and all objects presented in isolation, that is outside of the corresponding scene. Ratings of this database were obtained via online questionnaires that consisted of subsets of the stimulus sets. In the current study, we used the object-in-scene ratings for the critical objects. In experiment 1, 104 triads of photographs of scenes and 26 similar fillers were presented, making a total of 336 scenes. The three different versions in each scene triad differed in that one object was replaced in the scene. This critical object differed in ground truth motivational value, ranging from appetitive to neutral to aversive. The different versions were presented in three different blocks, balancing for the number of appetitive, neutral, and aversive scenes (41 of each per block). Presentation of the scenes within a
Fig. 1. Conditions. The 9 conditions used in experiment 2 for one example triad (number 040 in the database): ‘‘ground-truth” motivational value varies across columns, visual salience of the object relative to the background across rows. For experiment 1, only stimuli corresponding to the middle row were used.
164
J. Schomaker et al. / Vision Research 133 (2017) 161–175
block was randomized, and block order was counterbalanced between participants, in order to control for scene repetition effects. For 104 scenes triads of all ground truth conditions (aversive, neutral, appetitive) ratings were available at the time of experiment (triads 1 through 104 in the database) and only those were analysed in experiment 1. For a first analysis, ground truth was taken as a categorical variable. In this analysis, only those 64/104 triads were included for which the motivational ratings had the intended order (i.e., aversive < neutral < appetitive, see Appendix A for details). The first analysis was to confirm general effects of motivation. However, even if actual ratings and ground truth are consistent in their order, images in a triad might still fall all within the same overall category or just two of the categories rather than spanning the whole range of the three motivational scales. For example, all ratings of a triad could be in the aversive motivational range on the absolute scale, even if the relative scaling was consistent with the assumed ground truth. Hence, the second analysis, which used linear mixed models, ignored the intended ground truth and instead used the actual mean ratings as independent variable. Note that the ratings were obtained from a group disjoint from experiment participants. Of the 64 triad scenes used in this study’s ground truth analysis, the mean motivational measure showed the largest variation between the motivational categories, with the lowest ratings for aversive, intermediate for neutral and highest ratings for appetitive objects as based on the ground truth division (mean aversive rating: 3.04 (SD = 1.00); mean neutral rating: 3.91 (SD = 0.71); mean appetitive rating: 4.75 (SD = 0.70). Also valence showed ratings in the same direction (mean aversive valence: 3.85 (SD = 0.61); mean neutral valence: 3.91 (SD = 0.47); mean appetitive valence: 4.50 (SD = 0.54). Arousal ratings showed a different pattern, with higher ratings for both aversive and appetitive objects compared to neutral ones (mean aversive arousal: 3.90 (SD = 0.46); mean neutral arousal: 3.58 (SD = 0.36); mean appetitive arousal: 3.98 (SD = 0.46)). For experiment 2, we created 3 versions of each stimulus. The medium-salience version uses the original version of experiment 1, for a high-salience condition only the object retains colour and the background is desaturated to grayscale (bottom row in Fig. 1), while the reverse is done for the low-salience condition (top row in Fig. 1). High, low and medium salience here is meant to refer to the visual salience of the critical object relative to the background. A subset of 99 triads (triads 1 through 99 in the database) were selected. This allowed us to balance the order such that in each of 3 blocks each triad occurred only once and all 9 conditions (3 motivational by 3 visual levels) occurred equally often (11 times). To avoid effects of image repetition as much as possible, each observer viewed each version of each triad only once, each in a different visual condition. In addition, if only the first block of 99 trials were analysed, the design would be a between-subject design that is balanced with respect to the condition x image combination across the 18 observers. For an initial analysis that takes ground truth as a categorical variable, only those 62/99 triads were used for which the motivational ratings had the intended order (cf. Appendix A). All stimuli have an aspect ratio of either 4:3 or 3:4. In experiment 1, stimuli were presented at 800 600 pixels (landscape) or 600 768 (portrait) resolution centred on the screen, for the latter the upper and lower 16 pixels were cropped. In experiment 2, stimuli were presented at 1000 750 pixels (landscape) or 750 1000 pixels (portrait) centrally. 2.3. Apparatus In experiment 1 stimuli were presented on a on a 190 CRT screen (EIZO FlexScan F77S). The screen was located 73 cm from the
participant and ran at 1024 768 resolution at a refresh rate of 100 Hz. Luminance ranged from 0.1 cd/m2 (‘‘black”) to 66.0 cd/m2 (‘‘white”). Images were presented centrally on a black background. In experiment 2, stimuli were presented on a VIEWPixx/3D Lite (VPixx Technologies Inc., Saint-Bruno, Quebec, Canada): monitor running at 1920 1080 pixels resolution at a refresh rate of 120 Hz located 57 cm from the observer. Luminance ranged from 0.1 cd/m2 (‘‘black”) to 94.4 cd/m2 (‘‘white”). Images were presented centrally on a medium gray background (47.2 cd/m2). In both experiments eye movements were recorded with an Eyelink-1000 (experiment 1) or an Eyelink-1000 plus (experiment 2) device in tower-mount configuration. The built-in software with default parameters was used to identify fixations, saccades and blinks. Experiment 1 was programmed and presented using Open Sesame 3.0 (Mathot, Schreij, & Theeuwes, 2012). Stimulus presentation in experiment 2 was controlled using Matlab (The MathWorks, Natick, MA, USA) with its psychophysics and eyelink toolbox extensions (Cornelissen, Peters, & Palmer, 2002; Kleiner, Brainard, & Pelli, 2007). 2.4. Procedure The experimental procedure was identical for both experiments. Stimuli were presented for 5 s and preceded by a central fixation spot on a medium gray screen. The fixation screen was on until observers held fixation within 1 degree of the central spot for 300 ms; if they failed to do so within 3 s, the eyetracker was recalibrated. In experiment 2 image presentation was followed by a gray screen for 500 ms before the next trial started with a fixation. Consistent with previous experiments, we instructed the observers to ‘‘study the image carefully” and that they were ‘‘free to move their eyes naturally, whenever an image is presented”. 2.5. Analysis Fixation on an object was defined as a fixation falling within the bounding box around the boundaries of an object (i.e., the smallest rectangle that contains all pixels of the object). To check the robustness of the analysis, we report the same analyses using exact object boundaries as well as bounding boxes that are extended by 1 degree of visual angle in Appendix B for experiment 2. In the analyses regarding the critical objects, three dependent variables were considered separately for each critical object: the probability that an observer looked at an object (‘‘fixation probability”), the overall time an observer spent looking at an object (‘‘total dwell time”), and the time from stimulus onset to the first fixation that landed on the object (‘‘time to first fixation”). If an object was not fixated, the total dwell time on the object was set to 0. For time to first fixation, two different strategies were used to treat this situation: where possible, the time to first fixation for these cases was set to infinity (note that the median of a distribution is still finite if more than half of the values are finite), otherwise, only objects that were fixated were included in the ‘‘time to first fixation” analysis. The latter analysis is referred to as ‘‘time to first fixation given fixated”, since the data are conditioned on the object being fixated. We also investigated whether motivational value generally affected viewing behaviour of the scenes. For this purpose ‘‘mean fixation duration” of all fixations in a trial and ‘‘mean saccade length”, defined as the distance between subsequent fixations, were analysed. For experiment 1, we first performed repeated measures analysis of variance (rmANOVA) to test for a main effect of motivational value (levels: aversive, neutral, appetitive) as defined by the images’ ground truth on any of the dependent variables. For experiment 2, we used a 2-factor rmANOVA with factors
165
J. Schomaker et al. / Vision Research 133 (2017) 161–175
3. Results
fixation probability
a
1
0 1.5 total dwell time [s]
b
0
aversive
1.5 ***
mean time to first fixation|fixated [s]
c
aversive neutral appetitive *** ***
0
aversive
1.5 ***
median time to first fixation [s]
d
0
aversive
neutral *
neutral *
neutral
appetitive ***
appetitive ***
appetitive
e 50
appetitive neutral aversive
% fixations on critical object
motivational value (aversive, neutral, appetitive) and visual salience (low, medium, high). In addition, for both experiments we investigated the course of effects on fixation probability by performing rmANOVAs with fixation number (1 to 10) as an additional factor. We restricted this analysis to the first 10 fixations, as in both experiments for more than 90% of trials subjects made at least 10 fixations, while this percentage drops rapidly after the tenth fixation. In this context, fixation 1 is the first fixation after the initial central fixation, which we refer to as fixation 0. For all relevant analyses we performed a Greenhouse-Geisser correction when the sphericity assumption was violated. In both experiments, we in addition modelled the unique effect of the three scales (motivation, arousal, valence; fixed effects) from the MONS database (Schomaker, Rau, Einhaueser, & Wittmann, submitted) on each of the dependent variables. For both valence and motivation we included quadratic terms as well, as the effects of these are possibly non-linear (both positive and negative emotional stimuli may attract attention, and similarly both aversive and appetitive motivational stimuli may attract attention compared to neutral stimuli). As further fixed-effect predictors the models included (1) the peak salience within the object’s bounding box according to the AWS model (Garcia-Diaz, Fdez-Vidal, et al., 2012; Garcia-Diaz, Leboran, et al., 2012), (2) the Euclidian distance of the centre of the bounding box from the image’s centre, (3) the area of the object’s bounding box. Image and subject were treated as random-effect predictors for the models. Since using a fullrandom-effects structure for seven fixed and two random effects is not feasible, we reduced model complexity by fixing the correlations between random intercepts and slopes to zero (cf., Barr, Levy, Scheepers, & Tily, 2013). All model input variables were z-scored to have zero mean and unit standard deviation (Schielzeth, 2010), putting all predictors on a standardized scale, while not affecting the shape of the distributions. We checked for multicollinearity of our regressors, by calculating variance inflation factors (vif). High vif values may be found for linear and quadratic versions of a regressor, and possibly also for valence or motivation as the two measures are typically correlated. Nevertheless, the vif values were within an acceptable range (experiment 1: all vifs < 2.7; experiment 2: all vifs < 2.5; second model experiment 2: all vifs < 2.4), which is well below the recommended maximum of 5 (Rogerson, 2001), or even 4 (Pan & Jackson, 2008), possibly because we z-scored our variables. The models were computed using the R system for statistical computing (version 3.2.2; R Core Team, 2015), including the lme4 R package (Bates, Mächler, Bolker, & Walker, 2014) version 1.1.7.
10
0 1
fixation number
10
Fig. 2. Results experiment 1. a) Fixation probability, b) total dwell time, c) time to first fixation given fixation, d) time to first fixation (N = 14). e) Effects over time for probability of fixation on the critical object for each of the three motivational categories. All bars and error bars reflect means and s.e.m. over subjects. Nonmarked comparisons are non-significant at a 0.05 level.
3.1. Experiment 1 – effects of ground truth categories In a first analysis, we used ground truth (‘‘appetitive”, ”neutral”, ‘‘aversive”) as a categorical variable and included those 64 triads for which the rank order in the aggregated motivational rating matched the intended ground truth (Appendix A). All mean results are shown in Fig. 2, data per subject can be found in Appendix C. Probability of fixation of the critical object was not affected by motivational value (F(2,28) = 0.98, p = 0.388; Fig. 2a), probably due to a ceiling effect. For total dwell time on the critical object, there was a main effect of motivation, (F(1.28,17.9) = 34.1, p < 0.001; Fig. 2b), with dwell times for aversive and appetitive stimuli each being larger than for neutral stimuli (t(14) = 6.31, p < 0.001 and t(14) = 14.0, p < 0.001, respectively) and no difference between appetitive and aversive stimuli (p = 0.64). There was also a main effect of motivation on time to first fixation for critical objects that were
fixated (F(2,28) = 43.0, p < 0.001; Fig. 2c) with differences between any pair of conditions (aversive < neutral: t(14) = 5.80, p < 0.001; appetitive < neutral: t(14)=10.5, p < 0.001; appetitive < aversive: t(14) = 2.94, p = 0.011). In a second analysis, we treated unfixated critical objects as objects fixated after an infinite amount of time and considered the median in each observer for analysis. For those 14/16 observers who fixated more than 50% of the critical objects, we found a main effect (F(1.18,15.4) = 25.9, p < 0.001) with differences for all conditions (aversive < neutral: t(13) = 5.24, p < 0.001; appetitive < neutral: t(13) = 5.24, p < 0.001; appetitive < aversive: t (13) = 2.45, p = 0.029). Over the time-course of a trial, the chance that a critical object was fixated was reduced, as evidenced by a linear main effect of
166
J. Schomaker et al. / Vision Research 133 (2017) 161–175
fixation number, F(1,14)=72.9, p < 0.001 (see Fig. 2e). There was also an interaction of fixation number and motivation, F(18,252) = 2.58, p = 0.001, and a main effect of motivation, F(2,28) = 46.3, p < 0.001. Aversive and appetitive stimuli were more likely to be fixated than neutral stimuli (F(1,14) = 73.6, p < 0.001 and F(1,14) = 118.2, p < 0.001 respectively), with no difference between appetitive and aversive stimuli (F(1,14) = 1.12, p = 0.309). To investigate whether effects of motivation started early and persisted over time, we further tested the effects of motivation at the first and the tenth fixation. Effects of motivation started early with the first fixation, F(2,28) = 40.4, p < 0.001, and persisted at the tenth fixation, F(2,28) = 7.76, p = 0.002. The analysis so far only considered fixations on the critical object, but it is conceivable that general eye-movement properties are affected by the presence of a motivational object as well. For the general analyses regarding all eye movements in a trial, a main effect of motivation was found for mean saccade length, F(2,28) = 3.35, p = 0.050. Saccade length was shorter for the aversive compared to the neutral condition (t(14) = 3.02, p = 0.009), and a trend effect was found for shorter saccade length for appetitive versus neutral stimuli (t(14) = 1.80, p = 0.094), while no differences for appetitive versus neutral stimuli were found (t(14) = 0.46, p = 0.652). No effects of motivation were observed for mean fixation duration (F(1.25,17.5) = 0.10, p = 0.904).
3.2. Experiment 1 – linear mixed effect models The analysis so far has defined motivational value according to the aversive-neutral-appetitive axis as was intended when constructing the database, restricted to those with consistent rank order according to the motivational scene ratings (Fig. 2). A more fine-grained analysis is possible, when instead all the ratings from the MONS database for each critical object are used (including ratings on motivational value, arousal, and valence). This also allows us to assess which property (motivational value, arousal, or valence) is the primary driver of this effect. Provided the U-shaped function along the ground truth aversive-neutralappetitive axis observed above, the square of the motivational value (centred at the mid-point of the scale) is used as a fourth predictor besides arousal, valence and motivational value, and the square of the valence score (centred at the mid-point of the scale) is used as a fifth predictor. In addition, 3 image-based predictors are included in the model (peak salience, object size and object eccentricity). Of the five non-visual predictors, only arousal predicts fixation probability, total dwell time and time to first fixation (Table 1). Arousing objects are more likely to be fixated, and are fixated earlier and longer. Both valence and arousal made independent contributions to the prediction of time to first fixation, with
more positive objects being fixated earlier. Of the visual predictors, object size predicts all 3 dependent variables: Large objects are more likely to be fixated, and are fixated earlier and longer. Eccentricity predicts time to first fixation and fixation probability: More central objects are fixated earlier and with higher probability. Peak salience predicts all three visual measures (i.e., time to first fixation, total dwell time, and fixation probability). More salient objects are fixated earlier, longer and with higher probability. In any case, this statistical analysis shows that arousal and valence are both non-visual factors that predict gaze allocation above and beyond the other – visual and non-visual – factors. In experiment 2 we test experimentally, whether a manipulation of visual salience affects the relation between non-visual factors and fixation probability.
3.3. Experiment 2 – effects of ground-truth categories and visual salience As for experiment 1, we restrict analysis to the triads where the ratings confirmed the relative rank order, which are 62 of the 99 triads used for experiment 2. For fixation probability on the critical object (Fig. 3a), we do not find main effects of motivational value (F(2,34) = 2.41, p = 0.011) or visual salience (F(2,34) = 2.58, p = 0.091), and no evidence for an interaction (F(4,68) = 0.53, p = 0.716). A lack of an effect on fixation probability is probably due to a ceiling effect, as we do find effects of motivation and salience when using exact object outlines rather than bounding boxes to define fixations on objects (see Appendix B), while none of the other analyses showed qualitative differences. For total dwell time on the critical object (Fig. 3b), main effects of both motivation and visual salience are found, but no interaction (motivation: F(2,34) = 116.8, p < 0.001; visual salience: F(2,34) = 15.1, p < 0.001; interaction motivation x visual salience: p = 0.315). Post-hoc tests show that highly salient objects were looked at longer than medium salient objects (F(1,17) = 22.9, p < 0.001) and also than low salient objects (F(1,17) = 19.8, p < 0.001). Dwell times were higher for the motivational than the neutral objects: Aversive objects were looked at longer than neutral stimuli (F(1,17) = 271.1, p < 0.001) and also appetitive objects were looked at longer than neutral stimuli (F(17) = 116.0, p < 0.001), but no difference between appetitive and aversive was found (F(1,17) = 0.08, p = 0.776). For time to first fixation of fixated critical objects (Fig. 3c), we find main effects of motivational (F(2,34) = 72.6, p < 0.001) and visual salience (F(1.50,25.4) = 29.0, p < 0.001), but no interaction (F(2.58,43.9) = 0.48, p = 0.672). Differences are found between all levels of visual salience. Low salient objects are fixated later than
Table 1 (G)LMM results experiment 1. The b value reflects the regressors’ slope, and the t-value is calculated as b/SE. For continuous variables (LMMs) the normalized regression coefficient (t-value) is given. Following Baayen, Davidson, & Bates (2008) no p-value is provided, but significant effects are asserted for |t| > 1.96. Significant predictions are shown in bold. For analysis of time to first fixations, trials in which the critical object was not fixated are excluded, hence the reduced number of data points (observations).
Observations Arousal Valence Quadratic Valence Motivation Quadratic Motivation Peak salience (AWS) Eccentricity Size
Fixation probability: b/z-value/p-value
Total dwell time: b/t-value
Time to first fixation: b/t-value
4680 (15 104 3) 0.21/2.25/0.02 0.10/0.84/0.40 0.04/0.31/0.76 0.03/0.27/0.79 0.06/0.43/0.67 0.26/2.23/0.03 0.32/2.56/0.01 2.03/9.43/< 0.001
4680 (15 104 3) 96.37/3.76 17.07/0.53 36.79/0.99 15.21/0.45 46.35/1.28 67.22/2.71 70.87/1.88 450.37/7.90
3978 73.45/2.97 100.13/3.12 28.02/0.79 24.53/0.71 70.33/1.99 92.84/4.19 232.18/7.17 193.86/5.69
J. Schomaker et al. / Vision Research 133 (2017) 161–175
a
*
*
1 fixation probability
high medium low
0.8
aversive
neutral
appetitive
b *** high low medium
neutral
***
appetitive ***
mean time to first fixation|fixated [s] d
aversive
neutral
*** 1.5
*** *
median time to first fixation [s] 0
appetitive
low medium high aversive
neutral
(F(1,17) = 15.4, p = 0.001). The same pattern holds for the time to first fixation with unfixated critical objects counted as fixated after infinite time (Fig. 3d; main effect motivational value: F(2,34) = 43.3, p < 0.001; main effect salience: F(1.46,24.7) = 18.0, p < 0.001; interaction: F(2.40,40.8) = 0.65, p = 0.555; salience: low > medium: F(1,17) = 11.46, p = 0.004, medium > high: F(1,17) = 12.01, p = 0.003; low > high: F(1,17) = 25.67, p < 0.001; motivational levels: aversive < neutral: F(1,17)=30.45, p < 0.001; appetitive < neutral: F(1,17) = 15.30, p < 0.001, appetitive < aversive: (F(1,17) = 7.94, p = 0.012). As in experiment 1, the chance that a critical object was fixated was reduced over time, as evidenced by a linear main effect of fixation number, F(1,18) = 51.12, p < 0.001 (see Fig. 4). There were also two-way interaction effects of fixation number and salience, F(18,306) = 9.42, p < 0.001, and fixation number and motivation, F(18,306) = 3.09, p < 0.001. For low salient objects, effects of motivation started early with the first fixation, F(2,34) = 11.20, p = 0.001, and persisted until the tenth fixation, F(2,34) = 5.69, p = 0.007. For medium salient objects, motivation affected fixation probability at the first, F(2,34) = 17.08, p < 0.001 but not tenth fixation, F(2,34) = 3.13, p = 0.056. For high salient objects, effects of motivation were found for both early and late fixations (first fixation: F(1.47,25.0) = 23.59, p < 0.001; tenth fixation: F(2,34) = 5.66, p = 0.008). In the general analyses including all eye movements in a trial, saccade length was shown to be affected by motivation (F(2,34) = 21.95, p < 0.001). Saccade length was shorter for the motivational than neutral conditions (aversive < neutral: F(1,17) = 45.26, p < 0.001; appetitive < neutral: F(1,17) = 32.60, p < 0.001). It did not differ between the two motivation categories (F(1,17) = 0.30, p = 0.590). Visual salience also affected saccade length (F(2,34) = 4.40, p = 0.020), while no interaction with motivation was found (F(4,68) = 1.18, p = 0.327). Saccade length was longer for medium salient than highly salient objects (F(1,17) = 10.62, p = 0.005). A statistical trend effect was found for shorter saccade length for high salient compared to low salient objects (F(1,17) = 3.72, p = 0.071). No difference in saccade length was found for low and medium salient objects (F(1,17) = 0.29, p = 0.596). Mean fixation duration was not affected by motivation (F(2,34) = 0.56, p = 0.575, or by visual salience (F(2,34) = 0.19, p = 0.824). Nor did motivation and salience interact (F(4,68) = 1.40, p = 0.245). 3.4. Experiment 2 – Linear mixed effect models
***
0
low medium high
***
**
1.5
*** **
c
aversive
** **
0
*** ***
***
total dwell time [s]
2.0
167
appetitive
Fig. 3. Results experiment 2. a) Fixation probability, b) total dwell time, c) time to first fixation given fixation, d) time to first fixation. Bars grouped by visual salience, in each group sorted by ground-truth motivational value. All bars and error bars means and s.e.m. over subjects. Significance markers omitted, values given in main text.
medium salient (F(1,17) = 10.3, p = 0.005) and highly salient objects (F(1,17) = 39.7, p < 0.001), and medium salient objects were fixated later than highly salient objects F(1,17) = 39.4, p < 0.001 (see Fig. 3c). Time to first fixation also differs for all levels of motivation. Aversive objects are fixated earlier than neutral objects (F(1,17) = 49.4, p < 0.001) and similarly appetitive objects are fixated earlier than neutral objects (F(1,17) = 226.0, p < 0.001). In addition, appetitive objects are fixated earlier than aversive ones:
As for experiment 1, we computed linear mixed effects models for five non-visual and three visual predictors. Analysing all data (Table 2), we find that arousal again predicts time to first fixation, total dwell time, and fixation probability, with high arousing objects being fixated earlier, longer, and with higher probability (Fig. 3). Valence again only predicts the time to first fixation: A positive object is fixated earlier. No quadratic effects of valence are found. Unlike in experiment 1, motivational value now exhibits a linear effect on time to first fixation, with aversive objects being fixated earlier than appetitive ones. Peak salience again predicts all 3 dependent variables, with salient stimuli being fixated with higher probability, earlier and longer. The same holds for object size, while eccentricity only affects fixation probability and the time to first fixation. More central locations are fixated earlier and with higher probability. Restricting the analysis to trials in which the stimuli were of medium salience, and thus matched experiment 1, yields almost the same qualitative results, but no longer reveals effects of eccentricity and arousal on fixation probability (Table 3).
J. Schomaker et al. / Vision Research 133 (2017) 161–175
0
appetitive neutral aversive
0 1
fixation number
10
0
c
medium salience
100
appetitive neutral aversive
% fixations on critical object
b
low salience
100
% fixations on critical object
a
0 1
fixation number
10
high salience
100
appetitive neutral aversive
% fixations on critical object
168
0
0 1
fixation number
10
Fig. 4. Fixation probability over time. Probability of fixation on the critical object plotted against fixation number for the three motivational categories for a) low salient, b) medium salient, and c) high salient objects in experiment 2. Error bars depict s.e.m. over subjects.
Table 2 (G)LMM results experiment 2. Notation as in Table 1. Significant predictions are shown in bold.
Observations Arousal Valence Quadratic Valence Motivation Quadratic Motivation Peak salience (AWS) Eccentricity Size
Fixation probability: b/z-value/p-value
Total dwell time: b/t-value
Time to first fixation: b/t-value
5346 (18 99 3) 0.37/2.31/0.02 0.08/0.43/0.67 0.21/1.00/0.32 0.20/0.99/0.32 0.02/0.12/0.91 0.72/4.23/<0.001 0.36/2.43/0.015 1.03/4.65/<0.001
5346 (18 99 3) 147.11/5.56 34.43/0.99 33.16/0.89 34.36/0.97 67.73/1.78 219.01/8.91 36.76/1.07 348.27/7.45
5212 66.22/2.91 101.21/3.95 5.46/0.20 65.61/2.35 8.31/0.30 229.12/8.81 150.18/4.81 134.44/-3.71
Table 3 (G)LMM results experiment 2. Analysis restricted to medium salience condition, notation as in Table 1. Significant predictions are shown in bold.
Observations Arousal Valence Quadratic Valence Motivation Quadratic Motivation Peak salience (AWS) Eccentricity Size
Fixation probability: b/z-value/p-value
Total dwell time: b/t-value
Time to first fixation: b/t-value
1782 (18 99) 0.17 /0.68/0.49 0.14/0.53/0.59 0.01 /0.04/0.97 0.02/0.05/0.96 0.24/0.71/0.48 1.11/3.71/< 0.001 0.33/1.53/0.13 1.40/3.57/< 0.001
1782 (18 99) 136.77/4.68 37.26/1.01 27.09/0.70 77.63/1.76 30.01/0.71 154.90/4.37 53.95/1.61 326.03/7.09
1731 95.50/2.99 138.75/3.65 29.53/0.98 81.29/2.27 2.73/0.08 120.07/5.25 183.93/5.05 96.34/3.21
4. Discussion In this study we aimed to investigate whether the emotional and motivational impact of every-day objects can affect attention in natural scenes (experiment 1), and whether these effects interact with visual salience (experiment 2). We found effects of motivation, valence and arousal on attention as indicated by gaze in the first experiment. In the second experiment, we manipulated visual salience by either desaturating the motivational object in a scene (low salience condition), or desaturating the background (high salience condition). Here, we found additional effects of motivation and visual salience on eye movement behaviour. More specifically, in experiment 1 we found that the affective-motivational objects, both the appetitive and the aversive ones, were looked at earlier and fixated longer than the neutral objects. Furthermore, motivational objects were more likely to be fixated than neutral objects, an effect that persisted over the course of a trial. Saccade length was found to be shorter in motivational compared to neutral conditions, possibly because
eye movements within the motivational objects were more likely and – given the rather sparse scenes – fixations within an object are generally shorter than between objects. These findings were replicated and extended in experiment 2, in which again the affective-motivational objects were fixated earlier and longer than the neutral stimuli. In addition, appetitive objects were more likely to be fixated, and were fixated earlier and longer than aversive stimuli. Although we found effects of motivation on early and late fixations in experiment 1, we did not find a main effect of ground truth categories on fixation probability. A reason we do not find consistent effects on fixation probability may be that the number of objects in the scenes was relatively low, and almost all critical objects were fixated by the end of a 5 s trial (as can be seen by a fixation probability of close to 1 for all conditions). Our eye-movement measures confirmed that also our visual salience manipulation in experiment 2 successfully affected attention. Highly salient objects were more likely to be fixated, were fixated longer, and fixated earlier than low salient objects.
J. Schomaker et al. / Vision Research 133 (2017) 161–175
Although the desaturated critical objects may be salient in the sense that they stand out against their background, our findings suggest that this did not result in pop-out effects. Our findings that the data for the all colour image (‘‘medium salience”) falls numerically in-between data for the gray object on colour (‘‘low salience”) and coloured object on gray (‘‘high salience”) background for fixation probability and time to first fixation argues against a pop-out effect for the grayscale object. A reason for these findings may be that competition with other objects overrides bottom-up attention effects. In line with the present findings, we have recently argued, that fixations are driven towards objects irrespective of their salience relative to the non-object background, while selection between objects can be driven by visual salience (Stoll et al., 2015). Interestingly, no interactions between visual salience and affective-motivational value were observed (neither in experiment 1 nor 2, nor for different object definitions reported in Appendix B). Taken together, these findings suggest that although affectivemotivational value (either appetitive or aversive) has similar effects as increased visual salience on gaze, the two seem to operate independently. To further investigate the factors affecting gaze in the two experiments and to disentangle the effects of visual and motivational salience, we statistically modelled eye movement behaviour including visual measures of peak salience, eccentricity and object size, and continuous indices of motivation, valence, and arousal. As expected, in both experiments 1 and 2 we found that object size was a strong predictor of eye movement behaviour, predicting the time to first fixation, as well as the fixation probability, and total dwell time. Also eccentricity affected fixation probability and time to first fixation, with more central objects being fixated earlier (both experiments) and with higher probability (both experiments). This is in line with previous literature showing that central objects are more likely to be fixated as acuity decreases with eccentricity (Turk-Browne & Pratt, 2005; Velisavljevic & Elder, 2008). As an additional visual factor we modelled salience, using the model that showed best performance in a recent review paper (Borji et al., 2013a), the AWS model (Garcia-Diaz, Fdez-Vidal, et al., 2012; Garcia-Diaz, Leboran, et al., 2012). In both experiments peak salience within the critical objects also predicted time to first fixation, total dwell time, and fixation probability, with high salience leading to earlier, longer, and more probable fixations than low salience. This suggests that low-level features (Itti, Braun, Lee, & Koch, 1998; Itti & Koch, 2000; Koch & Ullman, 1985) also play a role in guiding gaze in the present study. This is in line with recent evidence that – although objects have precedence over lowlevel features in fixation guidance (Einhauser et al., 2008) – the prioritization among objects can still be driven by low-level features (Stoll et al., 2015). We primarily aimed to investigate the interaction between visual salience and affective-motivational factors in the current study. The objects in the scenes varied mostly in terms of motivation (as intended by the MONS database), but also varied in terms of valence and arousal. Interestingly, we also observed effects of valence and arousal on attention, above and beyond effects of visual salience and motivation. In both experiments we found that highly arousing objects were fixated earlier, longer, and with higher probability, but valence only affected the time to first fixation in both experiments, such that more positively evaluated objects were fixated earlier than more negative ones. These findings are in line with previous studies that showed that valence and arousal can both guide attention in natural scenes (Codispoti, Ferrari, De Cesarei, & Cardinale, 2006; Leite et al.,
169
2012; Simola, Le Fevre, Torniainen, & Baccino, 2015; Simola, Torniainen, Moisala, Kivikangas, & Krause, 2013). In experiment 2, when including all conditions, we found additional effects of motivation on gaze behaviour in our linear mixed models, with motivational objects being fixated earlier than neutral ones. These results are in line with the ground truth-based analyses. In addition, more appetitive objects were fixated earlier than more aversive objects, and both motivationally appetitive and aversive objects were fixated longer. These findings suggest that motivational value can guide attention to objects in realistic scenes, in line with several previous studies (Failing & Theeuwes, 2015; Hickey & Peelen, 2015; Hickey et al., 2015). What is new about these findings is that our participants did not experimentally learn about the values of certain objects, or object categories, but attention was guided towards everyday objects that we have learned to associate with reward or punishment through real life experience (Schomaker, Rau, Einhauser, & Wittmann, submitted). This suggests that longterm selection history may affect attentional deployment (Awh et al., 2012). What is striking about these findings is that attention can thus be guided by complex stimuli, that are not easily categorisable, extending previous studies in which attention was guided by rather simple object categories (e.g. trees or people), when these categories were experimentally paired with reward (Failing & Theeuwes, 2015; Hickey & Peelen, 2015; Hickey et al., 2015; Vuilleumier, 2015). Importantly, our statistical models thus confirmed that effects of arousal, valence and motivational value are not fully explained by visual factors such as the object’s location, size, or salience, but operate above and beyond such visual factors. Our visual manipulation may have influenced motivational effects. For example, devaluation has been shown for food-related motivational stimuli presented in grayscale (Aouizerate et al., 2014). If such an effect occurred, however, this would actually work towards an interaction between the visual modification and the motivational value, but no interaction effects were found. Taken together, our two free-viewing experiments have shown that affective-motivational factors, such as motivational value, valence and arousal associated with every-day objects can influence attention in natural scenes. Interestingly, these effects occur automatically, even in the absence of a task, and on top of effects of low-level visual scene characteristics on attention. Visual salience and affective-motivational factors can thus guide attention independently.
Acknowledgments This work was supported by the German Research Foundation (DFG; SFB/TRR 135 ‘‘Cardinal Mechanisms of Perception”). We thank Julia Trojanek for help in data collection for experiment 2.
Appendix A In Fig. A.1 the three scene versions of each triad are shown together. The triads were created to vary in terms of motivational value (either aversive, neutral, or appetitive), whilst keeping visual characteristics between the versions of the scene as constant as possible. Deviance from ground truth, based on the objects in scenes ratings from the MONS database (Schomaker, Rau, Einhauser, & Wittmann, submitted), is shown in bar plots next to the scenes.
170
J. Schomaker et al. / Vision Research 133 (2017) 161–175
Fig. A.1. Scene Triads. The red bars reflect the difference between the appetitive and neutral ground truth version of the scene. The blue bars the difference between the ground truth aversive and neutral scene. Blue should thus be negative and red positive in order for the ratings to be consistent with the ground truth division. The triads are sorted by the difference between appetitive and aversive with the left column running from 1 to 26, the second from 27 to 54, the third from 55 to 80, and the fourth from 81 to 106. All bars are on the same scale (from 4 to +4). Note, that is can been seen that the inconsistent triads have rather small deviances. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
171
J. Schomaker et al. / Vision Research 133 (2017) 161–175
Table B.1 (G)LMM results experiment 2 when using actual object boundaries. The b value reflects the regressors’ slope, and the t-value is calculated as b/SE. For continuous variables (LMMs) the normalized regression coefficient (t-value) is given. Following Baayen, Davidson, & Bates (2008) no p-value is provided, but significant effects are asserted for |t| > 1.96. Significant predictions in bold. For analysis of time to first fixations, trials in which the critical object was not fixated are excluded, hence the reduced number of data points.
Observations Arousal Valence Quadratic Valence Motivation Quadratic Motivation Peak salience (AWS) Eccentricity Size
Fixation probability: b/z-value/p-value
Total dwell time: b/t-value
Time to first fixation: b/t-value
5346 (18 99 3) 0.10/0.69/0.49 0.35/2.12/0.03 0.27/1.13/0.26 0.28/1.61/0.11 0.21/1.05/0.30 0.41/3.27/0.001 0.15/1.02/0.31 1.02/4.60/<0.001
5346 (18 99 3) 94.48/3.41 106.10/3.02 23.42/0.60 46.38/1.29 90.05/2.24 206.88/8.42 7.25/0.20 284.67/5.50
4988 74.26/2.85 180.72/5.81 20.57/0.63 115.52/3.50 6.51/0.20 238.33/7.97 131.57/3.45 125.59/-3.38
Table B.2 (G)LMM results experiment 2 using bounding boxes that are enlarged by 1 degree of visual angle in each direction. Notation otherwise as in Table B.1.
Observations Arousal Valence Quadratic Valence Motivation Quadratic Motivation Peak salience (AWS) Eccentricity Size
Fixation probability: b/z-value/p-value
Total dwell time: b/t-value
Time to first fixation: b/t-value
5346 (18 99 3) 0.65/3.05/0.002 0.13/0.55/0.58 0.21/0.67/0.50 0.04/0.14/0.89 0.18/0.53/0.60 0.98/4.24/< 0.001 0.63/3.20/0.002 0.67/2.45/0.01
5346 (18 99 3) 137.67/4.82 36.22/1.04 16.59/0.44 36.40/1.03 68.08/1.82 214.72/8.97 78.04/2.33 357.69/8.16
52.99/2.42 90.67/3.80 1.37/0.06 60.47/2.33 7.80/0.30 212.68/9.16 172.40/5.57 114.46/3.51
Table B.3 Overview ANOVA results for different definitions of object area. Actual object pixels
Bounding box (analysis in main text)
Enlarged bounding box
Motivation Salience Motivation Salience
F 11.1 4.91 0.46
p <0.001 0.01 0.76
F 2.41 2.58 0.53
P 0.10 0.09 0.72
F 2.82 2.25 1.65
p 0.07 0.12 0.17
Motivation Salience Motivation Salience
F 113.8 15.1 1.32
p <0.001 <0.001 0.27
F 116.8 15.1 1.21
P <0.001 <0.001 0.31
F 85.1 13.7 1.11
p <0.001 <0.001 0.36
F 62.2 23.1 1.07 F 66.7 23.3 0.80
p <0.001 <0.001 0.38 p <0.001 <0.001 0.53
F 72.6 29.0 0.48 F 43.3 18.0 0.65
P <0.001 <0.001 0.75 P <0.001 <0.001 0.63
F 63.0 27.1 0.41 F 28.8 22.9 1.22
p <0.001 <0.001 0.80 p <0.001 <0.001 0.31
Fixation probability
Total dwell
t_first|fixated Motivation Salience Motivation Salience t_first Motivation Salience Motivation Salience
Appendix B Experiment 2 – effects of ground-truth categories and visual salience using exact object boundaries rather than bounding boxes Since the object modifications in experiment 2 were done only on the pixels of the objects, for completeness, we repeated the main analysis using the actual pixels of the object rather than its bounding box to define whether a fixation was within an object or not. As with the main analysis, we restricted analysis to the triads where the ratings confirmed the relative rank
order, which are 62 of the 99 triads used for experiment 2. For fixation probability on the critical object, we find main effects of motivational value (F(2,34) = 11.1, p < 0.001) and visual salience (F(2,34) = 4.92, p = 0.013), but no evidence for an interaction (F(4,68) = 0.46, p = 0.76). Post-hoc tests show that highly visually salient objects are fixated with higher probability than low visually salient objects (t(17) = 2.63, p = 0.02). Fixation probability differed between each pair of the motivational conditions (aversive > neutral: t(17) = 2.22, p = 0.04; appetitive > neutral: t(17) = 5.46, p < 0.001; aversive < appetitive: t(17) = 2.23, p = 0.04).
172
J. Schomaker et al. / Vision Research 133 (2017) 161–175
Fig. C.1. Individual variability critical object data Experiment 1. Data from individual participants per condition and eye movement measure in Experiment 1. Panels A to D show measures regarding the critical objects: A) Mean probability of fixation; B) Mean time to first fixation, given that the object was fixated; C) Median time to first fixation; D) Total dwell time. Panels E and F show measures regarding all eye movements in a trial: E) Mean fixation duration; F) Saccade length (defined as distance between subsequent fixations in pixels, 32 pixels correspond to one degree of visual angle). Data points reflect mean/median measures per subject.
For total dwell time on the critical object, main effects of both motivation and visual salience are found, but no interaction (motivation: F(2,34) = 113.8, p < 0.001; visual salience: F(2,34) = 15.1, p < 0.001; interaction motivation visual salience: p = 0.27). Post-hoc tests show that highly salient objects were looked at longer than medium salient objects (t(17) = 4.90, p < 0.001) and also than low salient objects (t(17) = 4.50, p < 0.001). Dwell times were higher for the motivational than the neutral objects: Aversive objects were looked at longer than neutral stimuli (t(17) = 15.2, p < 0.001) and also appetitive objects were looked at longer than neutral stimuli (t(17) = 10.9, p < 0.001), but no difference between appetitive and aversive objects was found (t(17) = 0.25, p = 0.80). For time to first fixation of fixated critical objects, we find main effects of motivational (F(2,34) = 62.4, p < 0.001) and visual salience (F(2,34) = 23.1, p < 0.001), but no interaction (F(4,68) = 1.06, p = 0.38). Differences are found between all levels of visual salience. Low salient objects are fixated later than medium salient (t(17) = 2.78, p = 0.01) and highly salient objects (t(17) = 5.98, p < 0.001), and medium salient objects were fixated later than highly salient objects t(17) = 4.88, p < 0.001(see Fig. 3c). Time to first fixation also differs for all levels of motivation. Aversive objects are fixated earlier than neutral objects (t(17) = 7.02, p < 0.001) and similarly appetitive objects are fixated earlier than neutral objects (t(17) = 12.35, p < 0.001). In addition, appetitive objects are fixated earlier than aversive ones: (t(17) = 3.18, p = 0.005). The same pattern holds for the time to first fixation with unfixated critical objects counted as fixated after infinite time (main effect motivational value: F(2,68) = 66.7, p < 0.001; main effect salience: F(2,68)=23.3, p < 0.001; interaction: F(4,68) = 0.80,
p = 0.53; salience: low > medium: t(17) = 3.60, p = 0.002, medium > high: t(17) = 3.59, p = 0.002; low > high: t(17) = 6.28, p < 0.001; motivational levels: aversive < neutral: t(17) = 7.34, p < 0.0001; appetitive < neutral: t(17) = 12.01, p < 0.001, appetitive < aversive: (t(17) = 2.78, p = 0.01). Note, that saccade length and fixation duration analyses were done on all eye movements during a trial, rather than fixations related to the (critical) object, and these are thus not affected by using different object boundaries. The model results for using all the object pixels rather than the bounding boxes are given in Table B.1. Some qualitative differences are seen for fixation probability and dwell time. It is likely that those differences result from thin objects (e.g., rings, whose interior by definition are not object pixels), where small errors in eye-position recording may result in some fixations falling mistakenly outside the object boundaries. Hence, the bounding box analysis seems more appropriate for robust results. This is corroborated by the observation that enlarging the bounding box by 1 degree of visual angle in each direction (to account for possible error in eye-position measurements) does not result in a qualitatively different result from the original bounding box, neither for the model (Table B.2 compared to table 2 in the main text) nor for the triad-based analysis (table B.3 for an overview). Appendix C To provide an estimate of inter-individual variability, we plotted the individual data for all experiments, conditions and dependent variables under consideration (Figs. C.1, C.2 and C.3).
J. Schomaker et al. / Vision Research 133 (2017) 161–175
173
Fig. C.2. Individual variability critical object data Experiment 2. Data from individual participants per salience and motivation condition and eye movement measure regarding the critical objects in Experiment 2. Data points reflect mean or median measures per subject. A) Probability of fixation; B) Mean time to first fixation given that the object was fixated; C) Median time to first fixation; D) Total dwell time.
174
J. Schomaker et al. / Vision Research 133 (2017) 161–175
Fig. C.3. Individual variability trial data Experiment 2. Data from individual participants per salience and motivation condition and eye movement measure regarding all eye movements during a trial in Experiment 2. Data points reflect mean or median measures per subject. A) Mean fixation duration; B) Saccade length (defined as distance between subsequent fixations in pixels; 36 pixels correspond to one degree of visual angle centrally).
References Hart, B. M., Schmidt, H. C., Klein-Harmeyer, I., & Einhauser, W. (2013). Attention in natural scenes: Contrast affects rapid visual processing and fixations alike. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences, 368(1628), 20130067. http://dx.doi.org/10.1098/rstb.2013.0067. Anderson, N. C., Ort, E., Kruijne, W., Meeter, M., & Donk, M. (2015). It depends on when you look at it: Salience influences eye movements in natural scene viewing and search early in time. Journal of Vision, 15(5), 9. http://dx.doi.org/ 10.1167/15.5.9. Aouizerate, B., Gouzien, C., Doumy, O., Philip, P., Semal, C., Demany, L., et al. (2014). Toward a new computer-based and easy-to-use tool for the objective measurement of motivational states in humans: A pilot study. BMC Psychology, 2(1), 23. Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive Sciences, 16(8), 437–443. http://dx.doi.org/10.1016/j.tics.2012.06.010. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3). http://dx.doi.org/10.1016/j.jml.2012.11.001. Bartra, O., McGuire, J. T., & Kable, J. W. (2013). The valuation system: A coordinatebased meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage, 76, 412–427. http://dx.doi.org/10.1016/j. neuroimage.2013.02.063. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823. Berridge, K. C., Robinson, T. E., & Aldridge, J. W. (2009). Dissecting components of reward: ’Liking’, ’wanting’, and learning. Current Opinion in Pharmacology, 9(1), 65–73. http://dx.doi.org/10.1016/j.coph.2008.12.014. Borji, A., Sihite, D. N., & Itti, L. (2013a). Objects do not predict fixations better than early saliency: A re-analysis of Einhauser et al.’s data. Journal of Vision, 13(10), 18. http://dx.doi.org/10.1167/13.10.18. Borji, A., Sihite, D. N., & Itti, L. (2013b). Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, 22(1), 55–69. http://dx.doi.org/10.1109/TIP.2012.2210727. Brady, T. F., Konkle, T., Alvarez, G. A., & Oliva, A. (2008). Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences USA, 105(38), 14325–14329. http://dx.doi.org/10.1073/ pnas.0803390105. Brady, T. F., Konkle, T., Oliva, A., & Alvarez, G. A. (2009). Detecting changes in realworld objects: The relationship between visual long-term memory and change blindness. Communicative & Integrative Biology, 2(1), 1–3. Retrieved from http:// www.ncbi.nlm.nih.gov/pubmed/19704852.
Chelazzi, L., Perlato, A., Santandrea, E., & Della Libera, C. (2013). Rewards teach visual selective attention. Vision Research, 85, 58–72. http://dx.doi.org/10.1016/ j.visres.2012.12.005. Clarke, A. D., & Tatler, B. W. (2014). Deriving an appropriate baseline for describing fixation behaviour. Vision Research, 102, 41–51. http://dx.doi.org/10.1016/j. visres.2014.06.016. Codispoti, M., Ferrari, V., De Cesarei, A., & Cardinale, R. (2006). Implicit and explicit categorization of natural scenes. Progress in Brain Research, 156, 53–65. http:// dx.doi.org/10.1016/S0079-6123(06)56003-0. Cornelissen, F. W., Peters, E. M., & Palmer, J. (2002). The Eyelink Toolbox: Eye tracking with MATLAB and the psychophysics toolbox. Behavior Research Methods Instruments & Computers, 34(4), 613–617. http://dx.doi.org/10.3758/ Bf03195489. Einhauser, W., Rutishauser, U., & Koch, C. (2008). Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. Journal of Vision, 8(2), 1–19. http://dx.doi.org/10.1167/8.2.2. Einhauser, W., Spain, M., & Perona, P. (2008). Objects predict fixations better than early saliency. Journal of Vision, 8(14), 11–26. http://dx.doi.org/10.1167/8.14.18. Failing, M., Nissens, T., Pearson, D., Le Pelley, M., & Theeuwes, J. (2015). Oculomotor capture by stimuli that signal the availability of reward. Journal of Neurophysiology, 114(4), 2316–2327. http://dx.doi.org/10.1152/jn.00441.2015. Failing, M. F., & Theeuwes, J. (2014). Exogenous visual orienting by reward. Journal of Vision, 14(5). Artn 6 10.1167/14.5.6. Failing, M. F., & Theeuwes, J. (2015). Nonspatial attentional capture by previously rewarded scene semantics. Visual Cognition, 23(1–2), 82–104. http://dx.doi.org/ 10.1080/13506285.2014.990546. Garcia-Diaz, A., Fdez-Vidal, X. R., Pardo, X. M., & Dosil, R. (2012). Saliency from hierarchical adaptation through decorrelation and variance normalization. Image and Vision Computing, 30(1), 51–64. http://dx.doi.org/10.1016/j.imavis.2011.11.007. Garcia-Diaz, A., Leboran, C., Fdez-Vidal, X. R., & Pardo, X. M. (2012). On the relationship between optical variability, visual saliency, and eye fixations: A computational approach. Journal of Vision, 12(6). Artn 1710.1167/12.6.17. Hare, T. A., Camerer, C. F., & Rangel, A. (2009). Self-control in decision-making involves modulation of the vmPFC valuation system. Science, 324(5927), 646–648. http://dx.doi.org/10.1126/science.1168450. Henderson, J. M., Brockmole, J. R., Castelhano, M. S., & Mack, M. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. Eye movements: A window on Mind and Brain, 537–562. Hickey, C., Chelazzi, L., & Theeuwes, J. (2010). Reward changes salience in human vision via the anterior cingulate. Journal of Neuroscience, 30(33), 11096–11103. http://dx.doi.org/10.1523/JNEUROSCI.1026-10.2010. Hickey, C., Chelazzi, L., & Theeuwes, J. (2014). Reward-priming of location in visual search. PLoS ONE, 9(7), e103372. http://dx.doi.org/10.1371/journal.pone. 0103372.
J. Schomaker et al. / Vision Research 133 (2017) 161–175 Hickey, C., Kaiser, D., & Peelen, M. V. (2015). Reward guides attention to object categories in real-world scenes. Journal of Experimental Psychology: General, 144 (2), 264–273. http://dx.doi.org/10.1037/a0038627. Hickey, C., McDonald, J. J., & Theeuwes, J. (2006). Electrophysiological evidence of the capture of visual attention. Journal of Cognitive Neuroscience, 18(4), 604–613. http://dx.doi.org/10.1162/jocn.2006.18.4.604. Hickey, C., & Peelen, M. V. (2015). Neural mechanisms of incentive salience in naturalistic human vision. Neuron, 85(3), 512–518. http://dx.doi.org/10.1016/j. neuron.2014.12.049. Hickey, C., & van Zoest, W. (2012). Reward creates oculomotor salience. Current Biology, 22(7), R219–220. http://dx.doi.org/10.1016/j.cub.2012.02.007. Higgins, E. T. (1997). Beyond pleasure and pain. American Psychologist, 52(12), 1280. Itti, L., Braun, J., Lee, D. K., & Koch, C. (1998). A model of early visual processing. Advances in Neural Information Processing Systems 10, 10, 173–179. Retrieved from
://WOS:000075130700025. Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10–12), 1489–1506. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10788654. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. http://dx.doi.org/10.1109/34.730558. Johnson, J. A., & Zatorre, R. J. (2005). Attention to simultaneous unrelated auditory and visual events: Behavioral and neural correlates. Cerebral Cortex, 15(10), 1609–1620. http://dx.doi.org/10.1093/cercor/bhi039. Kleiner, M., Brainard, D., & Pelli, D. (2007). What’s new in Psychtoolbox-3? Perception, 36, 14–14. Retrieved from ://WOS:000250594600049. Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–227. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3836989. Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010). Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. Journal of Experimental Psychology: General, 139(3), 558–578. http://dx.doi.org/10.1037/ a0019165. Kummerer, M., Wallis, T. S., & Bethge, M. (2015). Information-theoretic model comparison unifies saliency metrics. Proceedings of the National Academy of Sciences USA, 112(52), 16054–16059. http://dx.doi.org/10.1073/ pnas.1510393112. Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (2008). International affective picture system (IAPS): Affective ratings of pictures and instruction manual. Technical report A-8. Leite, J., Carvalho, S., Galdo-Alvarez, S., Alves, J., Sampaio, A., & Goncalves, O. F. (2012). Affective picture modulation: Valence, arousal, attention allocation and motivational significance. International Journal of Psychophysiology, 83(3), 375–381. http://dx.doi.org/10.1016/j.ijpsycho.2011.12.005. Logan, G. D. (2002). An instance theory of attention and memory. Psychological Review, 109(2), 376–400. Retrieved from http://www.ncbi.nlm.nih.gov/ pubmed/11990323. Mathot, S., Schreij, D., & Theeuwes, J. (2012). OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods, 44(2), 314–324. http://dx.doi.org/10.3758/s13428-011-0168-7. Nuthmann, A., & Henderson, J. M. (2010). Object-based attentional selection in scene viewing. Journal of Vision, 10(8), 20. http://dx.doi.org/10.1167/10.8.20. Pajak, M., & Nuthmann, A. (2013). Object-based saccadic selection during scene perception: Evidence from viewing position effects. Journal of Vision, 13(5). http://dx.doi.org/10.1167/13.5.2. Pan, Y., & Jackson, R. (2008). Ethnic difference in the relationship between acute inflammation and serum ferritin in US adult males. Epidemiology and infection, 136(03), 421–431.
175
Pessoa, L. (2009). How do emotion and motivation direct executive control? Trends in Cognitive Sciences, 13(4), 160–166. http://dx.doi.org/10.1016/j. tics.2009.01.006. Raymond, J. E., & O’Brien, J. L. (2009). Selective visual attention and motivation: The consequences of value learning in an attentional blink task. Psychological Science, 20(8), 981–988. http://dx.doi.org/10.1111/j.1467-9280.2009.02391.x. Rogerson, P. (2001). Statistical methods for geography. Sage. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution, 1(2), 103–113. http://dx.doi.org/ 10.1111/j.2041-210X.2010.00012.x. Simola, J., Le Fevre, K., Torniainen, J., & Baccino, T. (2015). Affective processing in natural scene viewing: Valence and arousal interactions in eye-fixation-related potentials. Neuroimage, 106, 21–33. http://dx.doi.org/10.1016/j. neuroimage.2014.11.030. Simola, J., Torniainen, J., Moisala, M., Kivikangas, M., & Krause, C. M. (2013). Eye movement related brain responses to emotional scenes during free viewing. Frontiers in Systems Neuroscience, 7, 41. http://dx.doi.org/10.3389/ fnsys.2013.00041. Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Stigler, G. J. (1950). The development of utility theory I. The Journal of Political Economy, 307–327. Stoll, J., Thrun, M., Nuthmann, A., & Einhauser, W. (2015). Overt attention in natural scenes: Objects dominate features. Vision Research, 107, 36–48. http://dx.doi. org/10.1016/j.visres.2014.11.006. Tatler, B. W. (2007). The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7(14), 1–17. http://dx.doi.org/10.1167/7.14.4. Team, R. C. (2015). R: A language and environment for statistical computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2013. Document freely available on the internet at: http://www. r-project. org. Theeuwes, J. (1991). Exogenous and endogenous control of attention – The effect of visual onsets and offsets. Perception & Psychophysics, 49(1), 83–90. http://dx.doi. org/10.3758/Bf03211619. Theeuwes, J. (1992). Perceptual selectivity for colour and form. Perception & Psychophysics, 51(6), 599–606. http://dx.doi.org/10.3758/Bf03211656. Theeuwes, J. (2010). Top-down and bottom-up control of visual selection. Acta Psychologica, 135(2), 77–99. http://dx.doi.org/10.1016/j.actpsy.2010.02.006. Thorndike, E. L. (1911). Animal intelligence. Experimental studies. Macmillan. Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review, 113(4), 766–786. http:// dx.doi.org/10.1037/0033-295X.113.4.766. Turk-Browne, N. B., & Pratt, J. (2005). Attending to eye movements and retinal eccentricity: Evidence for the activity distribution model of attention reconsidered. Journal of Experimental Psychology: Human Perception and Performance, 31(5), 1061–1066. http://dx.doi.org/10.1037/00961523.31.5.1061. Velisavljevic, L., & Elder, J. H. (2008). Visual short-term memory for natural scenes: Effects of eccentricity. Journal of Vision, 8(4), 28. http://dx.doi.org/10.1167/ 8.4.28. Vuilleumier, P. (2015). Affective and motivational control of vision. Current Opinion in Neurology, 28(1), 29–35. http://dx.doi.org/10.1097/WCO.0000000000000159. Zhang, J., Berridge, K. C., Tindell, A. J., Smith, K. S., & Aldridge, J. W. (2009). A neural computational model of incentive salience. PLoS Computational Biology, 5(7), e1000437. http://dx.doi.org/10.1371/journal.pcbi.1000437.