Vision Research 104 (2014) 12–23
Contents lists available at ScienceDirect
Vision Research journal homepage: www.elsevier.com/locate/visres
Contextual modulation as de-texturizer Elena Gheorghiu a,⇑, Frederick A.A. Kingdom b, Nicolai Petkov c a
University of Stirling, Department of Psychology, Stirling, FK9 4LA Scotland, United Kingdom McGill Vision Research, Department of Ophthalmology, McGill University, Montreal, Qc, Canada c University of Groningen, Nijenborgh 9, 9747 AG Groningen, The Netherlands b
a r t i c l e
i n f o
Article history: Received 25 June 2014 Received in revised form 19 August 2014 Available online 7 September 2014 Keywords: Contour Shape Texture Surround suppression After-effect Contextual modulation
a b s t r a c t Contextual modulation refers to the effect of texture placed outside of a neuron’s classical receptive field as well as the effect of surround texture on the perceptual properties of variegated regions within. In this minireview, we argue that one role of contextual modulation is to enhance the perception of contours at the expense of textures, in short to de-texturize the image. The evidence for this role comes mainly from three sources: psychophysical studies of shape after-effects, computational models of neurons that exhibit iso-orientation surround inhibition, and fMRI studies revealing specialized areas for contour as opposed to texture processing. The relationship between psychophysical studies that support the notion of contextual modulation as de-texturizer and those that investigate contour integration and crowding is discussed. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction This review/opinion paper focuses on recent psychophysical and computational evidence that supports the idea that one of the functional roles of contextual modulation is to separate the processing of contours and textures. Contextual modulation influences on motion processing, color and depth will only be discussed when directly relevant to this proposed functional role (see other articles in this volume that deal explicitly with these other types of influence). 1.1. Background physiology Over the last three decades, neurophysiological studies have found many visual cortical neurons that are subject to influences outside of their classical receptive fields, or CRFs. Often termed the extra-classical receptive field, or ERF, neurons with ERFs are found in areas V1 and V2 (Allman, Miezin, & Mcguinness, 1985b; Angelucci & Bressloff, 2006; Blakemore & Tobin, 1972; Cavanaugh, Bair, & Movshon, 2002; Kapadia et al., 1995; Knierim & van Essen, 1992; Nelson & Frost, 1985; Nothdurft, Gallant, & Van Essen, 1999; Schwabe et al., 2006), area V4 (Sundberg, Mitchell, & Reynolds, 2009; Zanos et al., 2011), area MT (Allman, Miezin, & Mcguinness, 1985a; Cui et al., 2013) and area MST (Eifuku & Wurtz, 1998). Typically, stimuli placed outside the CRF ⇑ Corresponding author. E-mail address:
[email protected] (E. Gheorghiu). http://dx.doi.org/10.1016/j.visres.2014.08.013 0042-6989/Ó 2014 Elsevier Ltd. All rights reserved.
in neurons with an ERF suppress the response of the neuron. For example, V1 neurons sensitive to oriented lines and bars are often found to be suppressed by iso-oriented lines placed outside their CRF (Blakemore & Tobin, 1972; Kapadia et al., 1995; Knierim & van Essen, 1992; Levitt & Lund, 1997; Nelson & Frost, 1985; Nothdurft, Gallant, & Van Essen, 1999). This effect, termed ‘isoorientation surround suppression’, is nonlinear by virtue of the fact that it only manifests itself if the neuron is actively responding to a stimulus within its CRF. Such center–surround interactions in area V1 have been shown to depend on contrast: optimallyoriented high-contrast stimuli placed in the CRF are typically suppressed by stimuli in the surround, but if the CRF stimulus is low contrast it can be facilitated by surround stimuli (Kapadia, Westheimer, & Gilbert, 1999; Levitt & Lund, 1997; Schwabe et al., 2010). In general, neurons in the primary visual cortex with ERFs have largely been studied using simple visual stimuli such as single bars and grating patches, and their functional role(s) remain poorly understood. Center–surround interactions have also been observed in motion area MT (Allman, Miezin, & Mcguinness, 1985a; Born & Tootell, 1992; Cui et al., 2013; Raiguel et al., 1995; Tanaka et al., 1986) and lateral MST (Eifuku & Wurtz, 1998). MT center–surround interactions have been shown to depend critically on several stimulus parameters such as contrast, eccentricity and signal-tonoise ratio – see review by Tadin and Lappin (2005). For example studies have shown that motion direction and/or speed discontinuities between a central object and the surrounding background might play a role in breaking camouflage, image segmentation
13
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
and figure-ground segregation (Tadin & Lappin, 2005; Tadin et al., 2008). 1.2. Background demonstrations In natural images, spatial variations in properties such as orientation, color, depth or motion provide important information about the shapes of objects and textured surfaces. Objects rarely occur in isolation however, and are usually surrounded by other objects or embedded in textured surfaces. We must therefore be able to recognize objects against a cluttered background. Fig. 1a shows an early example from Galli and Zama (1931) of how texture impacts upon the perception of a simple object such as a triangle. The three sides of the triangle are equally salient when presented in isolation (upper panel) but when surrounded by gratings (lower panel) the side parallel to the grating bars is ‘lost’, i.e. perceived as part of the grating not the triangle, in a manner akin to the traditional Gestalt notion of ‘belongingness’. Note that the effect here is not an inability to detect the side of the triangle – it is perfectly visible – it is that we perceive the side of the triangle as part of the grating even when we are aware that the line is part of a triangle. In a similar vein Fig. 1b shows that when contours are flanked by several similar-shaped contours it is difficult to process the shape of an individual contour. Similarly, if a rectangular shape is part embedded in a grating (Fig. 1c) the embedded parts appear part of the grating which occludes the rectangle, as demonstrated by Kanizsa (1979). These demonstrations suggest that contours that form parts of textures are processed separately from contours that exist in their own right. This leads to the hypothesis that one role of contextual modulation is to signal and extract information about the presence and organization of contours that are likely parts of objects not textures. By suppressing the responses to oriented stimuli that are surrounded by similarly-oriented stimuli, contours that are not part of textures are singled out and rendered more visible, an example of figure-ground segregation. In short, a possible role of contextual modulation is to de-texturize the image.
(a)
(b)
In brief, we suggest that one important functional role of contextual modulation is to separate the processing of the shape of contours from those of textures. This does not of course mean that textures are unimportant to vision. On the contrary, textures are vital for object segmentation and surface shape perception – see recent reviews by Graham (2011) and Landy (2013, chap. 45). As we shall see, the behavioral evidence for de-texturization of contours goes hand-in-hand with evidence that textures enjoy their own dedicated encoding mechanisms. In short, object contours and textures are processed in parallel but different pathways. 2. Modeling the extra-classical receptive field Beginning with the seminal studies by Li (1998, 1999, 2002), model simulations of the responses of ERF neurons in areas V1 and V2 to both artificial and natural-scene images reveal forcefully their role as de-texturizers (Grigorescu, Petkov, & Westenberg, 2003, 2004; Huang, Jiao, & Jia, 2008; Huang, Jiao, Jia, & Yu, 2009; Li, 1998, 1999, 2002; Petkov & Subramanian, 2007; Petkov & Westenberg, 2003; Ursino & La Cara, 2004). In these models, the classical receptive-field of a model ERF neuron is modeled as a simple or complex cell receptive field. Simple cell receptive fields are typically modeled as two-dimensional Gabor functions (Daugman, 1980, 1985; Jones & Palmer, 1987), while the Gabor energy model is popular for complex cells – in this case two simple cell receptive fields with a phase difference of p/2 are combined as a quadrature pair (Boukerroui, Noble, & Brady, 2004; Neumann & Sepp, 1999). In the ERF model proposed by Petkov and colleagues (Grigorescu, Petkov, & Westenberg, 2003, 2004; Petkov & Westenberg, 2003), there are two stages. In the first stage the CRF is modeled as a complex cell RF with a quadrature pair of orientation-selective Gabor weighting functions (Fig. 2b shows one of these functions, the symmetric one) applied to the intensity distribution of the image and used to compute the cells’ response as the Gabor energy. The second stage models the surround of the CRF as an inhibition term computed as the (negative) weighted sum of
(c)
Fig. 1. Examples of how texture impacts the perception of contour shape. (a) The three sides of the triangle are equally salient when presented in isolation (upper panel) but when surrounded by gratings (lower panel) the side parallel to the grating bars is perceived as a part of the grating not the triangle – see Galli and Zama (1931). (b) A closed contour-shape (upper panel) flanked by several similar-shaped contours (lower panel) is difficult to be processed as an individual contour. (c) A larger rectangular shape partly embedded in surround gratings appears as being occluded by these gratings - see Kanizsa (1979).
14
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
Fig. 2. (a) Extra-classical receptive field (ERF) inhibition is caused by the surround of the CRF. (b) The CRF can be modeled as a complex cell with an orientation-selective Gabor energy weighting function. (c) The surround of the CRF can be modeled as inhibition from the weighted sum of similarly oriented complex cell responses, the sum being positive in the area surrounding the CRF and zero elsewhere. Fig. 2a is re-used from Fig. 6 of Petkov and Westenberg (2003) with kind permission from Springer Science and Business Media.
complex cell CRF responses with identical or (more generally) similar tuning (for orientation and spatial frequency), the weighting coefficients being positive in the area surrounding the CRF and zero in the CRF (Fig. 2c). The final, surround-modulated response of the model ERF neuron is the weighted difference between the CRF and surround terms. The application of the model to a line surrounded by gratings is shown in Fig. 3. The CRF response of the vertically oriented complex cells in Fig. 3c is black for a positive response and white for a zero response. The surround responses in Fig. 3e are similarly represented, as is the final ERF modulated response in Fig. 3f. The figure shows that surround inhibition is strong in the region covered by the vertical grating (lower part of Fig. 3a) and weak for the region covered by the 45 deg grating (upper part of Fig. 3a). The symmetric Gabor function of the quadrature pair used to model the complex cell CRF is shown in Fig. 3b while the weighting function used to model surround suppression is shown in Fig. 3d. Petkov and colleagues’ model provides a ready explanation for the Galli and Zama (1931) effect, as shown in Fig. 4a. When the triangle is embedded in a grating (lower left panel), the ERF model output (lower right panel) shows that the side of the triangle parallel to the grating is inhibited by the surround grating. Fig. 4b illustrates the well-known orientation contrast pop-out effect in
(a) Visual input image
(b) Symmetric vertical-oriented Gabor filter
(c) CRF-response
which a small vertical bar surrounded by a texture made of horizontal bars pops out of its surround (upper panel) but when surrounded by similar, vertically-oriented bars fails to pop out (lower panel). The ERF model output shows that the surround inhibition is strong in the region covered by the vertical bars (lower panel) and weak for the region covered by orthogonally-oriented bars (upper panel). Finally, Fig. 5 shows the model applied to an image of a natural scene. Note the near-absence of responses to the grass texture because the response at any given image point in the grass will be inhibited by similar responses from around it. In the examples shown in Figs. 4 and 5 the suppression of texture is complete. This is a consequence of the extent to which the inhibition term from outside the CRF is taken into account. If a smaller value of its coefficient is used then the suppression will only be partial, as is typical for V1 neurons. Furthermore, this model explains the de-texturizing effect of surround suppression on textures that are made up mainly of contours, i.e. that contain oriented elements that would activate orientation selective neurons. The model would not work as well on textures that consist of blobs; for this case a similar model can be designed using responses of neurons that respond to blobs. In this respect, the above mentioned model is inspired by V1, as evidenced by the orientation selective filters it uses, but the suppression principle it
(d) The weighing func- (e) ERF inhibition tion used to model surterm round suppression by ERF responses
(f) ERF modulated response
Fig. 3. Petkov and collaborators’ model applied to a line surrounded by gratings. (a) The input image. (b) The symmetric Gabor function of the quadrature pair used to model the CRF of a complex cell with preferred vertical orientation. (c) The CRF response of the model complex cell: the black areas indicate a strong response and white areas zero response. (d) The weighting function used to model surround suppression by ERF responses. (e) The surround inhibitory term computed as a weighted sum of ERF responses. (f) The ERF modulated response. Note that the surround inhibition is strong in the region covered by the vertical grating (lower part of panel a) and weak for the region covered by the 45 deg grating (upper part of panel a). Fig. 3a is re-used from Fig. 7a of Petkov and Westenberg (2003) with kind permission from Springer Science and Business Media.
15
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
(a)
Input Image
CRF response
ERF modulated response
Input Image
CRF response
ERF modulated response
(b)
Fig. 4. Petkov and colleagues’ model applied to (a) the Galli and Zama (1931) effect (b) the orientation contrast pop-out effect. Input images are shown in the left panels, the CRF response in the middle panels and the ERF model output in the right panels. Fig. 4 panels (a) and (b) are re-used from Figs. 8a,b,d and 9a,b, respectively, of Petkov and Westenberg (2003) with kind permission from Springer Science and Business Media.
Input Image
CRF response
ERF modulated response
Fig. 5. Petkov and colleagues’ model applied to an image of a natural scene: input image (left), CRF response (middle) and ERF model output (right). (Left panel from Grigorescu, Petkov, and Westenberg (2004), courtesy of Image and Vision Computing, publisher Elsevier).
16
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
uses is generic. While the above presented filter-based model captures the main essence of surround suppression it does not take into account the dynamics of neurons and cannot reproduce transient effects. Such effects can be modeled by a system of dynamical partial differential equations as employed by the models of surround suppression proposed by Li (1998, 1999, 2002), Mundhenk and Itti (2005) and Huang, Jiao and Jia (2008), where facilitation is treated together next to inhibition. Iso-orientation surroundsuppression has also been implemented in visual saliency models (Li, 1998, 1999, 2002) which emphasise the computational role played by V1 neurons in pre-attentive visual tasks such as contour-enhancement, texture segregation and segmentation, and which include demonstrations of pop-out effects such as those in Fig. 4b. Finally, it is noteworthy that many surround suppression models (Dubuc & Zucker, 2001; La Cara & Ursino, 2004; Li, 1998, 1999, 2002; Mundhenk & Itti, 2005; Wei, Lang, & Zuo, 2013; Zeng, Li, & Li, 2011; Zeng et al., 2011) have been taken up by the engineering and computer science community for use in practical computer vision applications. There are many studies in this area but for reasons of space we mention only a few (Guo et al., 2009; Huang et al., 2009; Joshi & Sivaswamy, 2006; Papari & Petkov, 2008, 2011; Rodrigues & du Buf, 2004; Tang, Sang, & Zhang, 2007) – for further references we refer the reader to Papari and Petkov (2011).
3. Experimental evidence for de-texturization We now consider some of the experimental data that supports the hypothesis that contextual modulation acts to de-texturize the image. Much of what is discussed below is based on results obtained using two closely related shape after-effects. One of these is the shape-frequency after-effect. First reported in two studies by Kingdom and Prins (2005) and Gheorghiu and Kingdom (2006), the shape-frequency after-effect is the phenomenon in which the perceived shape-frequency of a sinusoidal-shaped contour is altered following adaptation to a contour of slightly different shape frequency. The closely related shape-amplitude after-effect operates on the shape amplitude, i.e. modulation depth rather than shape frequency of the contours and was first reported by Gheorghiu and Kingdom (2007b). Importantly, both shape-frequency and shapeamplitude after-effects have been observed not only in contours but also in textures made from sinusoidal-shaped contours arranged in parallel (Gheorghiu & Kingdom, 2012b; Kingdom & Prins, 2005). The procedure for measuring the shape after-effects is to present subjects with pairs of adaptors one above the other that differ by a factor of three in shape frequency (or shape amplitude) and following adaptation require them to adjust the relative shape-frequencies (or shape amplitudes) of two test stimuli presented at the same retinal locations as the adaptors until the point of subjective equality (PSE) is reached. The magnitude of the after-effect is then calculated from the physical difference in test shape-frequency (or shape amplitude) at the PSE. Fig. 6 illustrates example contours and textures, and the procedure. The textures were constructed from a series of contours identical in shape and arranged in parallel. The contours and textures in Fig. 6 are strings of Gabor elements rather than continuous lines. Gabor strings offer the advantage that their luminance spatial-frequency composition remains more-or-less constant in spite of changes to the shape-frequency, shapeamplitude and number of contours in the stimulus. Comparisons of the shape-frequency after-effect in contours and textures reveal different selectivities to dimensions such as luminance contrast polarity and chromaticity. For example, when the adaptors and tests are defined along different color directions (e.g. red–green vs. blue–yellow vs. black–white), or different poles of a given color direction (e.g. red vs. green or blue vs. yellow or
black vs. white) the shape after-effect with contours is significantly reduced but with textures is largely unaffected. This implies that contour-shape but not texture-shape mechanisms are selective to color direction and polarity (Gheorghiu & Kingdom, 2007a, 2012a). The non-selectivity to color direction for texture-shape mechanisms has also been evidenced in a sub-threshold summation study. Pearson and Kingdom (2002) found that sub-threshold amounts of luminance-defined and color-defined orientation modulations added linearly to reach detection threshold. Brain imaging studies support the idea that contour and texture processing is separated in the brain. Using fMRI, Dumoulin, Dakin, and Hess (2008) showed that when natural scenes are decomposed into sub-images that emphasise either the contours or the textures in the scene, extra-striate regions are predominantly activated by the contour sub-images whereas texture images elicit strong responses in area V1. Dumoulin et al. conjectured that extra-striate regions incorporate a combination of facilitation and surround suppression that effectively amplifies sparse contour information. Although the above mentioned fMRI evidence points to textures being processed in striate area V1 other studies have shown that a variety of extra-striate areas (e.g. V2, V4, TEO) are involved in processing different types of texture (Baker et al., 2006; Cant & Goodale, 2007; Cant et al., 2008; Freeman et al., 2013; Kastner, De Weerd, & Ungerleider, 2000). The above evidence from adaptation, summation and fMRI studies points to the idea that textures, even those made from contours, are processed separately from non-texture contours. However, while not inconsistent with the idea that contextual modulation mechanisms highlight contours that are not part of textures, it is not direct evidence. More direct evidence comes from the ‘crossed’ conditions in the shape after-effect studies, that is when the adaptors are textures and the tests contours, and vice versa. If we designate a single contour as C and a texture made of contours as T, the uncrossed conditions are C/C and T/T, and the crossed conditions T/C and C/T. Fig. 7 shows unpublished data obtained from three subjects (the first author plus two naive observers) using contours made from Gabor strings. Because of variations in the overall magnitude of the shape-frequency aftereffects across subjects, the data for each subject has been normalized to the C/C condition. The experimental protocol was identical to that described in Gheorghiu and Kingdom (2012b), and the data replicate both the earlier findings of Kingdom and Prins (2005) using continuous contours, as well as data from various subsets of these four conditions reported in previous studies, for example the studies that investigated the spatial (Gheorghiu & Kingdom, 2012b), binocular (Gheorghiu et al., 2009b), chromatic (Gheorghiu & Kingdom, 2012a) and motion (Gheorghiu & Kingdom, 2012c, 2014) properties of surround suppression. The key result in Fig. 7 is that the shape-frequency after-effect in the crossed T/C condition is smaller than in the uncrossed C/C condition. Consider this finding. An adaptor comprising a series of contours arranged in parallel has ostensibly more adaptive power than that of a single contour, because during adaptation the texture will stimulate the test region’s area of visual space far more thoroughly than will a single contour adaptor. Yet its adaptive effect is weaker than that of a single contour. This finding can be explained by supposing that shape-coding neurons in higher visual areas are directly inhibited by other neurons in the same area sensitive to parallel structure. This is quite unlike the situation in conventional crowding experiments in which the surround consists of distracter elements that are irrelevant to the detection of the target. In crowding experiments, the crowding stimuli are genuine masks, in that they contain no signal whatsoever. In the aforementioned shape aftereffect studies, the texture adaptors contain multiple signals. Yet, the multiple-signal texture adaptors are less effective than the single-signal (i.e. contour)
17
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
(a)
(b)
Contour Adaptor
(d)
Texture Adaptor
(e)
Contour Test
(c)
Texture Adaptor
Texture Test
Fig. 6. Stimuli used in the adaptation experiments of Gheorghiu and Kingdom. One can experience the shape-frequency after-effect obtained with single contour adaptors (a) and textures (b and c) by moving one’s eyes back and forth along the markers located midway between the pair of adapting contours (top) for about 90s, and then shifting one’s gaze to the middle of the single test contours (d). Similarly, the after-effects can be induced in texture tests (e). The texture adaptors consist of a central contour flanked by either (b) same or (c) orthogonal orientation surround contours. The texture tests (e) comprise contours flanked by same-orientation surround contours.
Normalized shape after-effect
Average across observers 1
0.8 0.6 0.4 0.2
0 C/C
T/C
T/T
C/T
(C/T in Fig. 7). However, unlike in the T/C crossed condition this is not surprising, as in the C/T condition the texture test occupies a larger region of visual space than the contour adaptor, and therefore would not be expected to be significantly affected by the adaptor. Finally, the shape after-effect is restored in the T/C crossed condition when the micropattern orientations in all but the central adaptor contour are made orthogonal to those of the central contour (Gheorghiu & Kingdom, 2012a, 2012b, 2012c; Kingdom & Prins, 2009) (see Fig. 6c). Petkov and colleagues’ model, although not a model of adaptation, accounts qualitatively for all these shape after-effect results. The model outputs to the sinusoidal-shaped contours are shown in Fig. 8 for a texture made of contours (Fig. 8a) and for a contour with an orthogonal-in-orientation surround (Fig. 8b). The model CRF responses are all strong (middle panels), whereas the ERF responses highlight only the contours that are not parts of textures.
Adaptor / Test type 4. Tuning of surround suppression Fig. 7. Shape-frequency after-effects obtained with contour and texture adaptors, and contour (light gray bars) and texture tests (dark gray bars). Four conditions were examined: adaptor and test both contours (C/C), adaptor and test both textures (T/T), texture adaptor and contour test (T/C) and, contour adaptor and texture test (C/T).
adaptors in inducing after-effects in a single contour. And lest we forget, when both adaptor and test are textures (T/T in Fig. 7), the after-effect is restored, showing that there is nothing inherently weak about the textures as adaptors. Finally, when the adaptor is a single contour and the test a texture, the after-effect is very weak
Contours differ from their surrounds in many ways, for example in color, stereoscopic depth and motion (Regan, 2000). If contextual modulation acts to de-texturize the image by enhancing those contours that are ‘‘not part of’’ textures, the question arises as to what ‘‘not part of’’ is for vision. The dimension that has been most fully explored, in both physiology and psychophysics is orientation, with the result that the term ‘iso-orientation surround suppression’ has become almost synonymous with contextual modulation. When considering the effects of surround suppression on contours rather than grating patches, the parameter space for
18
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
(a) Input Image
(b)
(c)
Input Image
Snake central-contour flanked by snake surround CRF response
Snake central-contour flanked by ladder surround CRF response
ERF modulated response
ERF modulated response
Ladder central-contour flanked by snake surround Input Image
CRF response
ERF modulated response
Fig. 8. Petkov and colleagues’ model applied to Gheorghiu and Kingdom texture-shape stimuli. The input image (left) is a texture made of (a) a ‘snake’ central-contour flanked by same-orientation (i.e. ‘snake’) surround, (b) snake central-contour flanked by an orthogonal-in-orientation (i.e. ‘ladder’) surround and (c) ladder central-contour flanked by a snake surround. The model CRF response is shown in middle panels. The ERF model output is shown in the right panels. Note that the model CRF responses are all strong (middle panels), whereas the ERF responses highlight only the contours that are not parts of textures.
studying orientation relationships is very large, particularly using the shape after-effect paradigm. In the aforementioned studies, the central adaptor contours were constructed from orientations that were tangential to the path of the contour, and the orientations in the surround contours were either tangential or orthogonal to those in the central contour (Fig. 6b and c). But what if the central contour were constructed not from tangential but orthogonal orientations, i.e. were ‘ladders’ not ‘snakes’? Gheorghiu and Kingdom (2012b) found significant reductions in the shape aftereffect induced in ladder contour tests when the central ladder adaptor was surrounded by ladder but not snake contours. However, the twist to what is otherwise a simple story is that this is only true for extended texture surrounds. For surrounds of just a few contours (63 on each side of the center), the after-effect is equally suppressed by both same as well as orthogonal-orientation
surrounds. Fig. 9 shows the results plotted in order to show the magnitude of the suppressive effects of the surround, with a value of 1 indicating complete suppression and 0 indicating no suppression. For both snake (Fig. 9a) and ladder (Fig. 9b) central-adaptor and test contours, surround suppression from same-orientation surround contours increases with the number of surround contours (black symbols). For opposite-oriented surrounds (white symbols) however, surround suppression remains constant. In addition, Gheorghiu and Kingdom (2012b) found that the strength of suppression decreased gradually with increasing spatial separation between the central adaptor contour and its surround. It would appear that surround suppression as revealed in these shape adaptation studies has two components: one that is sensitive to same-orientation texture surrounds, is spatially extensive and prevents the shape of the contour from being processed as a contour;
19
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
(a)
Same (Snake-Snake)
Same (Ladder-Ladder)
Different (Snake-Ladder)
Different (Ladder-Snake) 1
Surround Suppression Index
Surround Suppression Index
1 0.8 0.6 0.4 0.2 0
Center-Surround Type
(b)
Center-Surround Type
3
5
7
9
11
13
Number contours adaptor
0.8 0.6 0.4 0.2 0
3
5
7
9
11
13
Number contours adaptor
Fig. 9. The mean across-subjects surround suppression index for (a) snake and (b) ladder central contour adaptor flanked by same-orientation surround (dark symbols) and orthogonal-orientation surround (white symbols) as a function of the number of contours in the adapting texture. Note that the surround suppression index is inversely related to the magnitude of the shape-frequency after-effect. A surround suppression index of 1 indicates complete suppression while 0 indicates no suppression. Results from Gheorghiu and Kingdom (2012b), courtesy of Journal of Vision, copyright holder ARVO.
the other component is sensitive to opposite-orientation texture surrounds, operates locally, and is mediated by a mechanism that disrupts contour-processing. The response of Petkov and colleagues’ model to a central ladder contour and orthogonal surround is shown in Fig. 8c. Besides local orientation, texture-surround suppression is selective for stereoscopic-depth (Gheorghiu et al., 2009b), and both global motion direction and temporal frequency (Gheorghiu & Kingdom, 2012c, 2014). However it is only weakly selective to color direction, and when the central adaptor contour and test are both achromatic, the magnitude of suppression is agnostic to the particular color of the surround (Gheorghiu & Kingdom, 2012a). In short, color selectivity is found to be most prominent for contour-shape processing, weaker for texture-surround suppression and absent for texture-shape processing, indicating that there is a hierarchy of color selectivity in human vision: contourshape > texture-surround suppression of contour-shape > texture shape. With regard to the motion properties of texture-surround suppression, the interaction between motion and contour detection is typically studied using contours made of Gabor elements whose carriers (i.e. luminance grating) move inside stationary envelopes. This type of motion may be termed local motion. Gheorghiu, Kingdom and Varshney (2009a) found no tuning for local motion of texture-surround suppression. On the other hand, the shape after-effect studies exploited the possibility of making the entire sinusoidal-shaped contour move inside a virtual window, termed global motion (Gheorghiu, Kingdom, & Varshney, 2009a; Gheorghiu & Kingdom, 2012c). A problem arises in that if the center and surround contours of the adaptor are parallel they cannot move in different directions or at different speeds without coming into contact. Gheorghiu and Kingdom (2012c) circumvented this problem by spatially separating the center and surround contours by twice the amplitude of the shape modulation. They found that the shape after-effects were only reduced by surround contours moving in the same but not opposite directions to the central contour, the reduction in shape after-effect caused by the surround texture increased in magnitude with the temporal frequency of the central contour, and was selective for same center–surround temporal frequency. The motion results are also consistent with the center–surround motion interactions observed in many visual areas such as V1 (Cao & Schiller, 2002; Jones et al., 2001), MT
(Allman, Miezin, & Mcguinness, 1985a; Born, 2000; Born & Tootell, 1992; Cui et al., 2013; Raiguel et al., 1995; Tanaka et al., 1986) and lateral MST (Eifuku & Wurtz, 1998). For example, responses of MT neurons were maximally suppressed when the motion in the surround was identical to that of their CRF preference (Allman, Miezin, & Mcguinness, 1985a), which closely parallels our finding that the shape after-effects were only reduced by surround contours moving in the same but not opposite directions to the central contour. In addition, Born (2000) showed that the directional tuning of the surround was broader than that of the center, and that the size of the surround was three to five times larger than the center (Raiguel et al., 1995). To summarize: texture-surround suppression evidenced in shape after-effects has been found to be strongly selective for orientation, stereoscopic-depth, global motion direction and temporal frequency, weakly selective to color direction and non-selective to local motion direction. These selectivities are consistent with one of the aims of vision being to segregate contours that define objects from those that form textured surfaces.
5. Relevance of contour detection and crowding studies So far the psychophysical evidence for contextual modulation as de-texturizer has focused mainly on adaptation-induced shape after-effects. One possibly related phenomenon is the masking of the detectability of contours by surround noise, as observed in the often-termed ‘path’ studies (reviewed by Field, Golden, and Hayes (2014)). In the path studies, the stimuli comprise randomly-positioned and randomly-shaped test contours made from sparsely-spaced co-oriented micropatterns, embedded in a ‘cluttered’ surround of randomly oriented micropatterns. The surround clutter makes the test contour (though not the test micropatterns) hard to detect because it introduces spurious parts-of-contours into both test and null stimuli – in short it reduces the signal-to-noise ratio of the target contour. The question arises as to whether surround suppression normally operates in these stimuli, either to enhance or suppress the detectability of the target contour over and above that caused by the negative effects of the noise. For example, surround suppression might have the effect of enhancing the detectability of the target if the spurious surround contours inhibited each other more than they did
20
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
the target contour. On the other hand surround suppression might suppress the target contour further because of the similarity of the surround and target path signals. Recently, Robol, Casco, and Dakin (2012) found that if the target contour was surrounded by a row of near-parallel rather than randomly-oriented micropatterns, detection worsened, whereas a surround row of near-perpendicular micropatterns improved detection (see also Dakin and Baruch (2009) and Schumacher, Quinn, and Olman (2011)). This demonstrates that surround suppression can operate in such stimuli, but leaves open the question as to whether it does so in stimuli with the standard randomlyoriented micropattern surrounds. It is worth noting in this regard that Li (1998, 1999, 2002) and Huang, Jiao, and Jia (2008) have based their ERF models on a combination of iso-orientation surround suppression with collinear facilitation. Echoing the results of Robol, Casco, and Dakin (2012), the collinear (snake) contours in their model are enhanced through collinear facilitation, more so if surrounded by orthogonally-oriented texture, and less so if surrounded by parallel-oriented texture. Robol, Casco, and Dakin (2012) interpreted their results in terms of the well-studied phenomenon known as crowding. In its best-known form, crowding is said to occur when a peripherally viewed target such as a letter or an object becomes harder to identify when surrounded by irrelevant distractor letters or objects (reviewed by Levi (2008, 2011)). Does the surround suppression identified here as de-texturizer play a role in crowding, or is crowding simply the introduction of noise into the detection mechanism? Although there are similarities between crowding and surround suppression, as well as between overlay suppression and surround suppression, there are also important aspects of these phenomena that mitigate against the idea of a common mechanism. For example, Petrov and collaborators (Petrov, Carandini, & McKee, 2005; Petrov, Popple, & McKee, 2007) have shown that surround suppression is distinct from overlay suppression both in function and neural locus. They found that surround suppression occurs after the stage at which overlay suppression is processed. Unlike overlay suppression, surround suppression was found to be narrowly-tuned to the orientation and spatial frequency of the test, was strong in the periphery and relatively independent of surround phase, eye-of-origin, or spatial layout around the target: when ‘sectors’ were used instead of an annulus surround, the position of the sectors did not have much effect on suppression strength (Petrov, Carandini, & McKee, 2005). Moreover, it has been shown that anisotropic masking, which is characteristic to crowding phenomena, is observed only at fine scales whereas surround suppression is observed at all scales. Petrov, Popple, and McKee (2007) reported that an outward mask was more effective than an inward mask in an orientation identification task used to measure crowding. Moreover no inward–outward anisotropy was observed for the contrast detection task used to measure surround suppression. Petrov, Popple, and McKee’s (2007) findings suggest that surround suppression and crowding are two distinct phenomena. At present there is no consensus about the mechanisms of crowding but one recent idea is that it is caused by a texture-regularization, i.e. a type of assimilation in which the central target takes on the appearance of the surround. Just as we earlier posed the question of whether the surround effects in contour detection and crowding studies were examples of de-texturization, we must ask whether the results from the aforementioned shape after-effect studies might alternatively be explained on the basis of reduced contour visibility and/or crowding. We believe not, and for two reasons. First, the contours in the shape-aftereffect studies never occluded one another, so the results cannot be due to overlay masking. Second, Gheorghiu and Kingdom (2012b) found no significant reduction in either the saliency of the central adaptor contour
or the precision with which its shape was encoded when surrounded by similarly-shaped contours, irrespective of the number of contours in the surround. Finally, it is worth reiterating again that unlike in the contour detection and crowding studies, the texture surrounds in the shape after-effect studies contributed signals to the processes involved. 6. Where might de-texturizing occur? Considering that a large proportion of orientation-selective V1 neurons exhibit iso-orientation surround suppression, de-texturization probably begins as early as V1. However, the fMRI evidence from Dumoulin, Dakin, and Hess (2008) discussed earlier, which showed that extra-striate regions were predominantly activated by contour rather than texture sub-images, suggests that de-texturization may occur mainly after V1. The psychophysical evidence from shape after-effects discussed above suggests that de-texturization certainly manifests itself at the level of shape processing, i.e. in intermediate-to-higher visual areas such as V4 and IT. Of course, this is entirely consistent with V1 neurons feed-forwarding their ‘de-texturized’ responses into shape-selective neurons in higher visual areas. Indirect evidence in support of V4 as the site of de-texturization comes from Gheorghiu et al. (2009b) finding that the shape-frequency after-effect was inhibited by parallel surround texture only when the surround was in the same stereoscopic depth plane as the central adaptor contour. Neurons in V4 are predominantly sensitive to oriented bars at near-disparities (Hinkle & Connor, 2001, 2002, 2005; Watanabe et al., 2002), which has been suggested to be important for figure-ground segregation. Some neurophysiological studies have found that V4 neurons are non-selective for motion-direction (Desimone & Schein, 1987; Mysore et al., 2006), while others have found that non-directional V4 neurons can develop motion direction selectivity after adaptation (Tolias et al., 2005). This adaptation-dependent motion selectivity could result from feedback from higher visual areas such as MT and MST (Tolias et al., 2005). This type of feedback from MT to V4 might explain the selectivity to relative motion-direction and temporal frequency of surround suppression evidenced by Gheorghiu and Kingdom (2012c). Some computational studies have suggested that ERF properties of cortical neurons in the visual cortex may not be exclusively due to feed-forward phenomena but may also result from feedback as a consequence of the visual system using an efficient hierarchical strategy for encoding natural images (Rao & Ballard, 1999). In other words, ERF might result directly from predictive coding of natural images. Such neurons that manifest ERF properties are interpreted as error-detecting neurons which signal the difference between input and its prediction from a higher visual area. In sum, contextual modulation as de-texturizer might be mediated by V1 neurons that feed-forward their ‘de-texturized’ responses into shape-selective neurons in intermediate-to-higher visual areas, while feedback from MT and MST motion-direction selective neurons might control the tuning of contextual modulation in shape-selective neurons in V4. 7. De-contourization vs. de-texturization? Although this review is primarily concerned with mechanisms that de-texturize the image in order to reveal object contours, it is important to reiterate that there exist mechanisms dedicated to processing textures. As noted earlier, textures are critical to many aspects of vision, including figure-ground segregation and surface shape recognition (Ben-Shahar, 2006; Ben-Shahar & Zucker, 2004; Julesz, 1981a, 1981b). It is sine qua non that
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
texture-sensitive mechanisms will be relatively insensitive to line contours that are not integral to the texture, and hence can be considered to be ‘de-contourizers’. However, the degree of insensitivity to a line contour will depend on factors such as the size of the mechanism’s receptive field and the type of nonlinearity with which the receptive-field sub-units are combined (see below). Moreover, many texture-sensitive mechanisms are primarily sensitive to changes in textural properties, and by definition these will be sensitive to the ‘contours’ defined by texture edges, so decontourization for these neurons will be limited to only certain types of contour. In the physiological domain, von der Heydt and collaborators (von der Heydt, Peterhans, & Dursteler, 1991, 1992) reported a type of cell in monkey termed a ‘grating cell’ (also termed a periodic pattern-selective cell) that appears to be highly selective to gratings containing several periodic cycles. It is open to question however whether grating cells are a class of cell in their own right or whether they are one end of a continuum of a broader class of cells, such as complex cells, that vary in spatial-frequency bandwidth. Notwithstanding this possibility, grating cells have provided much fodder for the modeling community. Von der Heydt et al. (1991, 1992) proposed a model of a grating cell in which the responses of simple cells with parallel receptive fields are combined using an AND-type non-linearity that renders them sensitive to multiplebut not single-line stimuli. Although von der Heydt et al.’s model responds optimally to gratings of a particular orientation and spatial frequency, it also responds to stimuli to which grating cells do not normally respond, e.g. a bar and an edge parallel to it. A more texture-specific model of a grating cell was proposed by Petkov and Kruizinga (1997), and their model has since been used in computer vision applications designed to detect textures in natural scenes (Kruizinga & Petkov, 1999). The Petkov and Kruizinga model involves two stages. The first stage consists of grating-sensitive sub-units, each of which combines the spatially coincident responses of on- and off-center even-symmetric model simple cells (modeled by Gabor filters), such that each sub-unit is best activated by a set of three parallel bars with appropriate spatial frequency, orientation and position. In the second stage, the subunits are combined by Gaussian-weighted summation across space, rendering the model responsive to gratings with more than three bars. Grigorescu, Petkov, and Kruizinga (2002) compared various types of model texture operator (e.g. Gabor energy models of complex cells built from the outputs of even- and odd-symmetric filters at each image point; complex moments of the local power spectrum) with the Petkov and Kruizinga model grating cell, and found the latter to be most effective at separating textures from object contours. Other models of grating cells include those by du Buf (2007) and Lourens et al. (2005). The du Buf model offers extra precision to the location of the boundary between gratings and non-gratings, while the Lourens et al.’s model, by using a slightly different non-linearity to combine the inputs from the Gabor simple-cell model sub-units, appears to match better the responses of monkey grating cells. The texture-sensitive neurons identified by Baker and colleagues in cat (Baker, 1999; Li & Baker, 2012; Mareschal & Baker, 1999) and monkey (Li et al., 2014) are notable for being optimally sensitive to particular spatio-temporal variations in textural properties (e.g. contrast) and are thus sensitive to the ‘contours’ that define texture edges (though see El-Shamayleh and Movshon (2011) for evidence to the contrary). Interestingly, all of the neurons so far identified by Baker and collaborators are also sensitive to luminance variations similar in scale to the optimal textural variations (Li & Baker, 2012), meaning that they are also sensitive to relatively coarse-scale luminance edges. Such neurons are not easily categorized as ‘de-contourizers’ if we include texture and
21
luminance edges as types of contour. The same argument applies to the most popular psychophysical model of texture processing, the filter-rectify-filter (FRF) model (reviewed by Graham, 2011; Landy, 2013, chap. 45), which embodies many of the properties of the neurons identified by Baker and colleagues (though not their sensitivity to coarse-scale luminance edges). In its standard version the FRF model consists of three stages, a first stage that detects the luminance detail of textures using relatively small-scale linear filters, a second stage that nonlinearly transforms (e.g. through rectification or squaring) the outputs of the first stage and a third stage in which relatively large receptive fields tales the spatial differential of the outputs of the second stage in order to capture the texture’s modulation. By virtue of their relatively large second-stage receptive fields, FRF mechanisms will be largely unresponsive to individual line contours, but as with the neurons identified by Baker and colleagues, will nevertheless be sensitive to the contours defined by texture edges.
8. Summary and conclusion Converging psychophysical, computational and brain imaging studies indicate that one important role of contextual modulation is to signal and extract information about the presence and shapes of contours that are parts of objects rather than textured surfaces. This function is achieved by the mutual suppression of mechanisms that signal contours with a similar local orientation structure. The suppression is likely mediated by ERF neurons in early visual areas that feed-forward their responses into shape-selective neurons in intermediate-to-higher visual areas. Computational simulations of the responses of ERF neurons to artificial and natural images demonstrate that they have the effect of removing texture while leaving contours that are not part of textures, thus facilitating the detection of object contours. It remains unresolved whether the surround-suppression that we have identified with de-texturization has any involvement in contour detection and crowding. Finally, how does the notion of contextual modulation as detexturizer square with other proposed functions of contextual modulation? Schwartz and Simoncelli (2001) have suggested that the orientation selectivity of surround suppression plays a role in efficient coding, specifically that surround suppression is a special normalization mechanism that uses redundancy in natural images to enhance cortical response specificity. Other recent computational studies support this idea (Lochmann, Ernst, & Deneve, 2012). In keeping with this idea (Malik & Perona, 1990) have suggested that surround suppression facilitates texture segmentation. The thesis put forward here is consistent with both suggestions: orientation selectivity of surround suppression increases the response specificity to isolated contours at the expense of textures, and reveals the ‘contours’ that define orientation-defined discontinuities in textures. Acknowledgment The authors thank Dr. Curtis L. Baker Jr. for his helpful comments on the neurophysiology of grating cells.
Appendix A All computer simulation results were obtained with the on-line program available at http://matlabserver.cs.rug.nl/edgedetectionweb/web/index.html. For instance, the images presented in Fig. 3 were obtained using the following parameter values:
22
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23
Original image: synthetic1.png Gabor filtering parameters: Wavelength: 8 Orientation(s): 0 Phase offset(s): 0, 90 Aspect ratio: 0.5 Bandwidth: 1 Number of orientations: 1 Half-wave rectification parameters: Half-wave rectification disabled Superposition of phases parameters: Superposition method: L2 norm Surround inhibition parameters: Surround inhibition disabled Thresholding parameters: Thinning disabled Hysteresis thresholding disabled Displayed orientations: Orientations to display: 0
References Allman, J., Miezin, F., & Mcguinness, E. (1985a). Direction-specific and velocityspecific responses from beyond the classical receptive-field in the middle temporal visual area (Mt). Perception, 14(2), 105–126. Allman, J., Miezin, F., & Mcguinness, E. (1985b). Stimulus specific responses from beyond the classical receptive-field – Neurophysiological mechanisms for local global comparisons in visual neurons. Annual Review of Neuroscience, 8, 407–430. Angelucci, A., & Bressloff, P. C. (2006). Contribution of feedforward, lateral and feedback connections to the classical receptive field center and extra-classical receptive field surround of primate V1 neurons. Progress in Brain Research, 154, 93–120. Baker, C. L. Jr., (1999). Central neural mechanisms for detecting second-order motion. Current Opinion in Neurobiology, 9(4), 461–466. Baker, C. L., Mortin, C. L., Prins, N., Kingdom, F. A. A., & Dumoulin, S. (2006). Visual cortex responses to different texture-defined boundaries: An fMRI study. Journal of Vision, 6(6), 209. Ben-Shahar, O. (2006). Visual saliency and texture segregation without feature gradient. Proceedings of the National Academy of Sciences of the United States of America, 103(42), 15704–15709. Ben-Shahar, O., & Zucker, S. W. (2004). Sensitivity to curvatures in orientationbased texture segmentation. Vision Research, 44(3), 257–277. Blakemore, C., & Tobin, E. A. (1972). Lateral inhibition between orientation detectors in the cat’s visual cortex. Experimental Brain Research, 15(4), 439–440. Born, R. T. (2000). Center–surround interactions in the middle temporal visual area of the owl monkey. Journal of Neurophysiology, 84(5), 2658–2669. Born, R. T., & Tootell, R. B. H. (1992). Segregation of global and local motion processing in primate middle temporal visual area. Nature, 357(6378), 497–499. Boukerroui, D., Noble, J., & Brady, M. (2004). On the choice of band-pass quadrature filters. Journal of Mathematical Imaging and Vision, 21(1), 53–80. Cant, J. S., & Goodale, M. A. (2007). Attention to form or surface properties modulates different regions of human occipitotemporal cortex. Cerebral Cortex, 17(3), 713–731. Cant, J. S., Large, M. E., McCall, L., & Goodale, M. A. (2008). Independent processing of form, colour, and texture in object perception. Perception, 37(1), 57–78. Cao, A., & Schiller, P. H. (2002). Behavioral assessment of motion parallax and stereopsis as depth cues in rhesus monkeys. Vision Research, 42(16), 1953–1961. Cavanaugh, J. R., Bair, W., & Movshon, J. A. (2002). Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. Journal of Neurophysiology, 88(5), 2530–2546. Cui, Y., Liu, L. D., Khawaja, F. A., Pack, C. C., & Butts, D. A. (2013). Diverse suppressive influences in area MT and selectivity to complex motion features. Journal of Neuroscience, 33(42), 16715–16728. Dakin, S. C., & Baruch, N. J. (2009). Context influences contour integration. Journal of Vision, 9(2), 11–13 (13). Daugman, J. G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 20(10), 847–856. Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America A: Optics, Image Science, and Vision, 2(7), 1160–1169. Desimone, R., & Schein, S. J. (1987). Visual properties of neurons in area V4 of the macaque: Sensitivity to stimulus form. Journal of Neurophysiology, 57(3), 835–868. du Buf, J. M. H. (2007). Improved grating and bar cell models in cortical area V1 and texture coding. Image and Vision Computing, 25(6), 873–882.
Dubuc, B., & Zucker, S. W. (2001). Complexity, confusion, and perceptual grouping. Part I: The curve like representation. International Journal of Computer Vision, 42(1/2), 55–82 (Reprinted in Journal of Mathematical Imaging and Vision, 15(51/ 52), 55–82 (2001)). Dumoulin, S. O., Dakin, S. C., & Hess, R. F. (2008). Sparsely distributed contours dominate extra-striate responses to complex scenes. Neuroimage, 42(2), 890–901. Eifuku, S., & Wurtz, R. H. (1998). Response to motion in extrastriate area MST1: Center–surround interactions. Journal of Neurophysiology, 80(1), 282–296. El-Shamayleh, Y., & Movshon, J. A. (2011). Neuronal responses to texture-defined form in macaque visual area V2. Journal of Neuroscience, 31(23), 8543–8555. Field, D. J., Golden, J. R., & Hayes, A. (2014). Contour integration and the association field. In L. M. Chalupa & J. S. Werner (Eds.), The new visual neurosciences. MIT Press. Freeman, J., Ziemba, C. M., Heeger, D. J., Simoncelli, E. P., & Movshon, J. A. (2013). A functional and perceptual signature of the second visual area in primates. Nature Neuroscience, 16(7), 974–981. Galli, A., & Zama, A. (1931). Untersuchungen uber die Wahrnehmung ebener geometrischer Figuren, die ganz oder teilweise von and eren geometrischen Figuren verdeckt sind. Zeitschrift fur Psychologie, 123, 308–348. Gheorghiu, E., & Kingdom, F. A. (2006). Luminance-contrast properties of contourshape processing revealed through the shape-frequency after-effect. Vision Research, 46(21), 3603–3615. Gheorghiu, E., & Kingdom, F. A. (2007a). Chromatic tuning of contour-shape mechanisms revealed through the shape-frequency and shape-amplitude aftereffects. Vision Research, 47(14), 1935–1949. Gheorghiu, E., & Kingdom, F. A. (2007b). The spatial feature underlying the shapefrequency and shape-amplitude after-effects. Vision Research, 47(6), 834–844. Gheorghiu, E., & Kingdom, F. A. (2012a). Chromatic properties of texture-shape and of texture-surround suppression of contour-shape mechanisms. Journal of Vision, 12(6), 16. Gheorghiu, E., & Kingdom, F. A. (2012b). Local and global components of texturesurround suppression of contour-shape coding. Journal of Vision, 12(6), 20. Gheorghiu, E., & Kingdom, F. A. A. (2012c). Motion direction and temporal frequency tuning of texture surround capture of contour-shape. Journal of Vision, 12(9), 884 (Abstract). Gheorghiu, E., & Kingdom, F. A. A. (2014). Temporal phase tuning of texturesurround suppression of contour shape. Perception, 40(Suppl.), 113 (Abstract). Gheorghiu, E., Kingdom, F. A., Thai, M. T., & Sampasivam, L. (2009b). Binocular properties of curvature-encoding mechanisms revealed through two shape after-effects. Vision Research, 49(14), 1765–1774. Gheorghiu, E., Kingdom, F., & Varshney, R. (2009a). Global not local motion direction tuning of curvature encoding mechanisms. Journal of Vision, 9(8), 651. http:// dx.doi.org/10.1167/1169.1168.1651 (Abstract). Graham, N. V. (2011). Beyond multiple pattern analyzers modeled as linear filters (as classical V1 simple cells): Useful additions of the last 25 years. Vision Research, 51(13), 1397–1430. Grigorescu, S. E., Petkov, N., & Kruizinga, P. (2002). Comparison of texture features based on Gabor filters. IEEE Transactions on Image Processing, 11(10), 1160–1167. Grigorescu, C., Petkov, N., & Westenberg, M. A. (2003). Contour detection based on nonclassical receptive field inhibition. IEEE Transactions on Image Processing, 12(7), 729–739. Grigorescu, C., Petkov, N., & Westenberg, M. A. (2004). Contour and boundary detection improved by surround suppression of texture edges. Image and Vision Computing, 22(8), 609–622. Guo, S. Y., Pridmore, T., Kong, Y. G., & Zhang, M. (2009). An improved Hough transform voting scheme utilizing surround suppression. Pattern Recognition Letters, 30(13), 1241–1252. Hinkle, D. A., & Connor, C. E. (2001). Disparity tuning in macaque area V4. NeuroReport, 12(2), 365–369. Hinkle, D. A., & Connor, C. E. (2002). Three-dimensional orientation tuning in macaque area V4. Nature Neuroscience, 5(7), 665–670. Hinkle, D. A., & Connor, C. E. (2005). Quantitative characterization of disparity tuning in ventral pathway area V4. Journal of Neurophysiology, 94(4), 2726–2737. Huang, W., Jiao, L., & Jia, J. (2008). Modeling contextual modulation in the primary visual cortex. Neural Networks, 21(8), 1182–1196. Huang, W. T., Jiao, L. C., Jia, J. H., & Yu, H. (2009). A neural contextual model for detecting perceptually salient contours. Pattern Recognition Letters, 30(11), 985–993. Jones, H. E., Grieve, K. L., Wang, W., & Sillito, A. M. (2001). Surround suppression in primate V1. Journal of Neurophysiology, 86(4), 2011–2028. Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6), 1233–1258. Joshi, G. D., & Sivaswamy, J. (2006). A computational model for boundary detection. Computer Vision, Graphics and Image Processing, 4338, 172–183 (Proceedings). Julesz, B. (1981a). Textons, the elements of texture perception, and their interactions. Nature, 290(5802), 91–97. Julesz, B. (1981b). A theory of preattentive texture discrimination based on firstorder statistics of textons. Biological Cybernetics, 41(2), 131–138. Kanizsa, G. (1979). Organisation in vision. New York: Praeger. Kapadia, M. K., Ito, M., Gilbert, C. D., & Westheimer, G. (1995). Improvement in visual sensitivity by changes in local context: Parallel studies in human observers and in V1 of alert monkeys. Neuron, 15(4), 843–856.
E. Gheorghiu et al. / Vision Research 104 (2014) 12–23 Kapadia, M. K., Westheimer, G., & Gilbert, C. D. (1999). Dynamics of spatial summation in primary visual cortex of alert monkeys. Proceedings of the National Academy of Sciences of the United States of America, 96(21), 12073–12078. Kastner, S., De Weerd, P., & Ungerleider, L. G. (2000). Texture segregation in the human visual cortex: A functional MRI study. Journal of Neurophysiology, 83(4), 2453–2457. Kingdom, F., & Prins, N. (2005). Different mechanisms encode the shapes of contours and contour-textures. Journal of Vision, 5(8), 463 (Abstract). Kingdom, F. A., & Prins, N. (2009). Texture-surround suppression of contour-shape coding in human vision. NeuroReport, 20(1), 5–8. Knierim, J. J., & van Essen, D. C. (1992). Neuronal responses to static texture patterns in area V1 of the alert macaque monkey. Journal of Neurophysiology, 67(4), 961–980. Kruizinga, P., & Petkov, N. (1999). Nonlinear operator for oriented texture. IEEE Transactions on Image Processing, 8(10), 1395–1407. La Cara, G. E., & Ursino, M. (2004). A neural network model of contours extraction based on orientation selectivity in the primary visual cortex: Applications on real images. In Conference proceedings IEEE engineering in medicine and biology society (Vol. 6, pp. 4029–4032). Landy, M. S. (2013). Texture analysis and perception. In J. S. Werner & L. M. Chalupa (Eds.), The new visual neurosciences (pp. 639–652). Cambridge, Mass.: MIT Press. Levi, D. M. (2008). Crowding – An essential bottleneck for object recognition: A mini-review. Vision Research, 48(5), 635–654. Levi, D. M. (2011). Visual crowding. Current Biology, 21(18), R678–679. Levitt, J. B., & Lund, J. S. (1997). Contrast dependence of contextual effects in primate visual cortex. Nature, 387(6628), 73–76. Li, Z. (1998). A neural model of contour integration in the primary visual cortex. Neural Computation, 10(4), 903–940. Li, Z. (1999). Visual segmentation by contextual influences via intra-cortical interactions in the primary visual cortex. Network, 10(2), 187–212. Li, Z. (2002). A saliency map in primary visual cortex. Trends in Cognitive Sciences, 6(1), 9–16. Li, G., & Baker, C. L. Jr., (2012). Functional organization of envelope-responsive neurons in early visual cortex: Organization of carrier tuning properties. Journal of Neuroscience, 32(22), 7538–7549. Li, G., Yao, Z., Wang, Z., Yuan, N., Talebi, V., Tan, J., Wang, Y., Zhou, Y., & Baker, C. L. Jr., (2014). Form-cue invariant second-order neuronal responses to contrast modulation in primate area v2. Journal of Neuroscience, 34(36), 12081–12092. Lochmann, T., Ernst, U. A., & Deneve, S. (2012). Perceptual inference predicts contextual modulations of sensory responses. Journal of Neuroscience, 32(12), 4179–4195. Lourens, T., Barakova, E., Okuno, H. G., & Tsujino, H. (2005). A computational model of monkey cortical grating cells. Biological Cybernetics, 92(1), 61–70. Malik, J., & Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms. Journal of the Optical Society of America A: Optics, Image Science, and Vision, 7(5), 923–932. Mareschal, I., & Baker, C. L. Jr., (1999). Cortical processing of second-order motion. Visual Neuroscience, 16(3), 527–540. Mundhenk, T. N., & Itti, L. (2005). Computational modeling and exploration of contour integration for visual saliency. Biological Cybernetics, 93(3), 188–212. Mysore, S. G., Vogels, R., Raiguel, S. E., & Orban, G. A. (2006). Processing of kinetic boundaries in macaque V4. Journal of Neurophysiology, 95(3), 1864–1880. Nelson, J. I., & Frost, B. J. (1985). Intracortical facilitation among co-oriented, coaxially aligned simple cells in cat striate cortex. Experimental Brain Research, 61(1), 54–61. Neumann, H., & Sepp, W. (1999). Recurrent V1–V2 interaction in early visual boundary processing. Biological Cybernetics, 81(5–6), 425–444. Nothdurft, H. C., Gallant, J. L., & Van Essen, D. C. (1999). Response modulation by texture surround in primate area V1: Correlates of ‘‘popout’’ under anesthesia. Visual Neuroscience, 16(1), 15–34. Papari, G., & Petkov, N. (2008). Adaptive pseudo dilation for gestalt edge grouping and contour detection. IEEE Transactions on Image Processing, 17(10), 1950–1962. Papari, G., & Petkov, N. (2011). Edge and line oriented contour detection: State of the art. Image and Vision Computing, 29(2–3), 79–103. Pearson, P. M., & Kingdom, F. A. (2002). Texture-orientation mechanisms pool colour and luminance contrast. Vision Research, 42(12), 1547–1558. Petkov, N., & Kruizinga, P. (1997). Computational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: Bar and grating cells. Biological Cybernetics, 76(2), 83–96. Petkov, N., & Subramanian, E. (2007). Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition. Biological Cybernetics, 97(5–6), 423–439.
23
Petkov, N., & Westenberg, M. A. (2003). Suppression of contour perception by bandlimited noise and its relation to nonclassical receptive field inhibition. Biological Cybernetics, 88(3), 236–246. Petrov, Y., Carandini, M., & McKee, S. (2005). Two distinct mechanisms of suppression in human vision. Journal of Neuroscience, 25(38), 8704–8707. Petrov, Y., Popple, A. V., & McKee, S. P. (2007). Crowding and surround suppression: Not to be confused. Journal of Vision, 7(2), 11–19 (12). Raiguel, S., Vanhulle, M. M., Xiao, D. K., Marcar, V. L., & Orban, G. A. (1995). Shape and spatial-distribution of receptive-fields and antagonistic motion surrounds in the middle temporal area (V5) of the macaque. European Journal of Neuroscience, 7(10), 2064–2082. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. Regan, D. M. (2000). Human perception of objects. Sunderland, MA: Sinauer Press. Robol, V., Casco, C., & Dakin, S. C. (2012). The role of crowding in contextual influences on contour integration. Journal of Vision, 12(7), 3. Rodrigues, J., & du Buf, J. M. H. (2004). Visual cortex frontend: Integrating lines, edges, keypoints, and disparity. Image Analysis and Recognition, 3211(Pt 1), 664–671 (Proceedings). Schumacher, J. F., Quinn, C. F., & Olman, C. A. (2011). An exploration of the spatial scale over which orientation-dependent surround effects affect contour detection. Journal of Vision, 11(8), 12. Schwabe, L., Ichida, J. M., Shushruth, S., Mangapathy, P., & Angelucci, A. (2010). Contrast-dependence of surround suppression in Macaque V1: Experimental testing of a recurrent network model. Neuroimage, 52(3), 777–792. Schwabe, L., Obermayer, K., Angelucci, A., & Bressloff, P. C. (2006). The role of feedback in shaping the extra-classical receptive field of cortical neurons: A recurrent network model. Journal of Neuroscience, 26(36), 9117–9129. Schwartz, O., & Simoncelli, E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8), 819–825. Sundberg, K. A., Mitchell, J. F., & Reynolds, J. H. (2009). Spatial attention modulates center–surround interactions in macaque visual area v4. Neuron, 61(6), 952–963. Tadin, D., & Lappin, J. S. (2005). Linking psychophysics and physiology of center– surround interactions in visual motion processing. In M. R. M. Jenkin & L. R. Harris (Eds.), Seeing spatial form (pp. 279–314). Oxford, UK: Oxford University Press. Tadin, D., Paffen, C. L., Blake, R., & Lappin, J. S. (2008). Contextual modulations of center–surround interactions in motion revealed with the motion aftereffect. Journal of Vision, 8(7), 1–11 (9). Tanaka, K., Hikosaka, K., Saito, H., Yukie, M., Fukada, Y., & Iwai, E. (1986). Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkey. Journal of Neuroscience, 6(1), 134–144. Tang, Q. L., Sang, N., & Zhang, T. X. (2007). Contour detection based on contextual influences. Image and Vision Computing, 25(8), 1282–1290. Tolias, A. S., Keliris, G. A., Smirnakis, S. M., & Logothetis, N. K. (2005). Neurons in macaque area V4 acquire directional tuning after adaptation to motion stimuli. Nature Neuroscience, 8(5), 591–593. Ursino, M., & La Cara, G. E. (2004). A model of contextual interactions and contour detection in primary visual cortex. Neural Networks, 17(5–6), 719–735. von der Heydt, R., Peterhans, E., & Dursteler, M. R. (1991). Grating cells in monkey visual cortex: Coding texture. In B. Blum (Ed.), Channels in the visual nervous system: Neurophysiology, psychophysics and models (pp. 53–73). Freund, London. von der Heydt, R., Peterhans, E., & Dursteler, M. R. (1992). Periodic-pattern-selective cells in monkey visual cortex. Journal of Neuroscience, 12(4), 1416–1434. Watanabe, M., Tanaka, H., Uka, T., & Fujita, I. (2002). Disparity-selective neurons in area V4 of macaque monkeys. Journal of Neurophysiology, 87(4), 1960–1973. Wei, H., Lang, B., & Zuo, Q. (2013). Contour detection model with multi-scale integration based on non-classical receptive field. Neurocomputing, 103, 247–262. Zanos, T. P., Mineault, P. J., Monteon, J. A., & Pack, C. C. (2011). Functional connectivity during surround suppression in macaque area V4. In Conference proceedings IEEE engineering in medicine and biology society (pp. 3342–3345). Zeng, C., Li, Y., & Li, C. (2011a). Center–surround interaction with adaptive inhibition: A computational model for contour detection. Neuroimage, 55(1), 49–66. Zeng, C., Li, Y., Yang, K., & Li, C (2011b). Contour detection based on a non-classical receptive field model with butterfly-shaped inhibition subregions. Neurocomputing, 74(10), 1527–1534.