Vision Res. Vol. 33, No.
II, pp. 153Sl544,
0042-6989/93 $6.00+
1993
An Orientation-Tuned Component Contrast Masking of Stereopsis J. S. MANSFIELD,*?
0.00
Copyright :C 1993 Pergamon Press Ltd
Printed in Great Britain. All rights reserved
in the
A. J. PARKER*
Received 23 April 1992; in revised form
18 December
1992
A masking paradigm was used to evaluate the orientation selectivity of the mechanisms mediating human stereopsis. Two experienced and eleven naive observers viewed stereograms, spatially filtered to contain contrast energy with Gaussian passhands in spatial frequency and orientation. Using forced-choice procedures we measured contrast thresholds for stereopsis in the presence of oriented masking patterns. Our results show that the masking of stereopsis consists of two components: one is orientation dependent, the other is non-oriented and has greatest magnitude at lower spatial frequencies. Contrary to an earlier study, these results imply that stereopsis mechanisms may have similar orientation tuning to mechanisms mediating contrast detection. Stereopsis
Orientation
Spatial frequency
Masking
INTRODUCTION It is commonly understood that the processing of luminance information required for simple contrast and pattern detection is mediated by multiple spatial filters tuned to the spatial frequency and orientation of luminance contrast energy (Campbell & Robson, 1968; Blakemore & Campbell, 1969; Wilson & Bergen, 1979; Graham, 1989). An extension of this line of thinking is to ask whether the processes mediating other visual capabilities are also spatial frequency and orientation tuned. The spatial frequency specificity of processes involved in stereoscopic depth perception has been examined in numerous ways. Adaptation studies (Blakemore & Hague, 1972; Felton, Richards & Smith, 1972) masking studies (Julesz & Miller, 1975; Yang & Blake, 1991) and the demonstration that disparities presented in widely separated spatial frequency bands may be perceived simultaneously at different depths (see Mayhew & Frisby, 1977) all suggest that the processes involved in stereopsis are tuned for spatial frequency. However, mixed results have been reported concerning the orientation tuning of stereoscopic mechanisms. The orientation
tuning of stereopsis
The earliest psychophysical investigations into the orientation characteristics of stereoscopic depth perception were performed with random-line stereograms (Marlowe, 1969; Julesz, 1971). For dense random-line patterns, the stereoscopic depth percept can be destroyed by rotating the line elements in one image so that *University Laboratory of Physiology, Parks Road, Oxford OX1 3PT. England. tTo whom all correspondence should be addressed at present address: Department of Psychology, University of Minnesota, 75 East River Road, Minneapolis, MN 55455, U.S.A.
Binocular vision
orientation differences (> 40”) exist between corresponding elements in each eye (Marlowe, 1969; Frisby & Roth, 1971; Frisby & Julesz, 1975a). Similarly, Mitchell and O’Hagan (1972) investigated the perception of stereodepth in stereograms consisting of a single short line segment presented to each eye. For lines longer than 5 min arc, stereoacuity was degraded if the lines in each eye were at orthogonal orientations. These results would suggest that local stereoscopic matches are made between “line detectors”, whose specificity for orientation is similar for the left and right eyes. However, the manipulation of the relative orientation between corresponding line elements, or an increase in the length of the line elements in the left and right images, produced a reduction in the perceived depth in the stereograms (Frisby & Julesz, 1975a, b; Mitchell & O’Hagan, 1972). The reduction in perceived depth is difficult to explain simply in terms of line orientation being used as a matching primitive. Julesz and Stromeyer (see Julesz, 1971, p. 92) investigated the orientation specificity of the depth aftereffect (Blakemore & Julesz, 1971) following adaptation to random-line stereograms. They found no noticeable difference in the strength of the depth aftereffect for the condition where the line elements in the adaptation and test patterns had the same orientation and the condition where the line elements in the test pattern were orthogonal to those in the adaptation stereogram. Altogether, the results from line stereograms yield a confusing picture of the orientation specificity of stereoscopic processing. One problem with the use of line element stereograms for investigating the issue of orientation selectivity is that there is broad-band contrast energy at all orientations caused by the terminations of the line elements. The quantitative contribution of the contrast energy at these undesired orientations is not
1536 known
J. S. MANSFIELD for certain.
Effects
observed
with
these
stimuli
“line detector” receptors or to isotropic “dot” detecting mechanisms. An alternative strategy is to consider stereopsis in terms of two-dimensional spatial filters. Mayhew and Frisby (1978) used a masking technique to investigate the orientation tuning of stereoscopic mechanisms. Their stimuli consisted of random-dot stereograms that had been bandpass-filtered to contain contrast energy at two orientation bands centred at 0 and 90”. The mask, which was added to the left member of the pair of stereo images, was also filtered about two orientations. When the orientations of the components in the mask were set to the same as those in the stereogram, the mask contrast was increased until observers reported that they could no longer perceive the depth carried in the stereogram. Then, when the mask was replaced by another with peak orientations +45 and -45”, observers reported that the quality of the stereoscopic depth percept was not improved. This is a surprising result if the local matches in stereopsis were obtained through oriented spatial filters, and it led Mayhew and Frisby (1978) to conclude that the channels for fine-grain stereoscopic disparity judgments in complex random textures were not tuned for orientation. They later supported this conclusion (Mayhew & Frisby, 1979) by investigating latencies for perceiving depth in random-dot stereograms portraying horizontally oriented corrugated surfaces. When the stereograms were filtered to contain only vertical information (+45”), the corrugations could never be seen. With only horizontal information (f45”), the depth modulations could be seen, but the latencies for making the experimental judgment were greater than in the non-filtered case. They argued that channels tuned to vertical orientations would be poor at resolving the steep changes in disparity used in their stereograms: even the smallest vertically oriented spatial channel would subtend over four or more rasters (in their display) and would, hence, blur together the disparity cues indicating the steeply slanted surface. If the perception of depth in the nonfiltered cases were mediated by horizontally oriented channels, then similar latencies would be expected for perceiving depth in the non-filtered and horizontally filtered conditions. In further support of Mayhew and Frisby’s conclusion, it has been argued that, computationally, it may be more efficient to perform local disparity matches on the outputs of isotropic filters rather than store the outputs from many filters all tuned to different orientations (Marr & Hildreth, 1980; Mayhew & Frisby, 1980; Grimson, 1981). However, it is not clear that such limitations will apply to biological visual systems in this straightforward way. might
be attributed
either
to oriented
*Although some cortical complex cells appear to have a combination of poor localization performance and narrow disparity tuning (Poggio & Fischer, 1977), it seems highly probable that the good stereo performance in such cells arises from the localization ability conferred by the binocular alignment of receptive field subunits (Ohzawa & Freeman, 1986).
and A. J. PARKER
Mayhew and Frisby (1978) acknowledged that theil results are contrary to the evidence from neurophysiological recordings, which show orientation-tuned disparity units in the cat (Barlow, Blakemore & Pettigrew, 1967; Joshua & Bishop, 1970) and monkey (Hubel & Wiesel, 1970; Poggio & Fischer, 1977). Indeed, most binocular cells in VI are selective for orientation, especially in the cat (Hubel & Wiesel, 1962). While there is evidence for binocular cells in primate Vl with essentially no orientation preference at all (Hawken & Parker, 1984), such cells seem to be relatively poor in the spatial localization of luminance contrast targets (Parker & Hawken, 1985) and are unlikely to be involved in the detection of finegrain disparities.* Moreover, most binocular cells in the primate show an orientation preference, even if some of them do continue to respond less strongly at the orientation orthogonal to that providing the best response (Hawken & Parker, 1984; Livingstone & Hubel, 1984). There is clear evidence that binocular processes (other than stereopsis) are mediated by orientation tuned mechanisms. For example, binocular summation is greatest when the stimuli in each eye have the same orientation (Westendorf & Fox, 1975; Blake & Levinson, 1977). Schneider, Moraglia and Jepson (1989) have shown that the binocular detectability of a vertically oriented Gabor target embedded in a binocularly correlated field of Gaussian noise could be improved if a small horizontal shift was introduced to the noise field in one eye. No such binocular unmasking occurred if the binocular Gabor target was oriented horizontally. These results are consistent with orientation tuned mechanisms mediating binocular unmasking, and binocular contrast detection. In the light of these studies, together with the physiological evidence, it might be expected that the stereo mechanisms would also be orientation tuned. If this was not so, there would be strong grounds for arguing that the detection of stereo disparity and contrast take place in quite separate mechanisms. Further, the potential value of orientation information in stereopsis, either in selecting candidate matches between the two monocular images (Barlow et al., 1967) or in decoding surface slant (Blakemore, Fiorentini & Maffei, 1972; Koenderink & Van Doorn, 1976; Rogers & Cagenello, 1989), suggests that orientation tuned units might play an important role in the generation of the three-dimensional stereoscopic percept. Given that the conclusions of Mayhew and Frisby (1978) are at variance with the other evidence regarding the orientation tuning of binocular and stereo mechanisms, we have re-investigated the contrast masking of stereopsis using forced-choice procedures. Preliminary results, also using a forced-choice depth identification procedure (Parker, Johnston, Mansfield & Yang, 1991) have already shown that the mask energy required to significantly disrupt performance on a stereo depth identification task does depend upon the relative orientation of the contrast energy in the signal stereograms and mask patterns. The experiments in this study analyze in more detail the orientation dependence of the masking of stereopsis.
ORIENTATION
TUNED
MASKING
OF STEREOPSIS
1537
bandwidth was sufficiently broad for the observers to perceive clearly the stereoscopic depth in the Bpeak= 90” Stimuli condition. For much narrower bandwidths, such The stimuli were generated using a Sun Microsystems stereograms would approach a horizontal grating (from TAAC- 1 application accelerator. For each stereogram, a which it wouid be impossible to extract horizontal 256 x 256 pixel array was filled with 32-bit random disparity info~ation). Our use of a broader bandwidth integers, using the non-linear additive feedback random is convenient, but ought to act against finding specificity number generator supplied with the TAAC software. A in the orientation domain. further 64 x 64 pixel noise sample was generated and The left and right images from each stereogram were placed in horizontally disparate positions in two copies filtered with epak = 0” [producing a vertical texture, see of the larger array. When viewed from 2.20 m, each Fig. l(A)] or epak = 90” (producing a horizontal texture). stereohalf subtended 1.67” and the stereoscopically fused Sets of masks were generated by filtering different arrays percept contained a central square region (0.42”) of (256 x 256 pixels) of random noise with epeak= 0, 30, 60, either crossed or uncrossed disparity (4 min arc). Each 90, 120 or 150”. Different masks with the same ewakand stereohalf was transformed into the Fourier domain r.m.s. contrast were added independently to the left and using the Fast Fourier Transform routines supplied with right stereo halves [see Fig. l(B, C)]. The masking stimuthe TAAC software. In the Fourier domain, the images lus was hence a binocularly uncorrelated noise pattern. were filtered with a Gaussian pass-band in both spatial Prior to combining the signal and mask images, the data frequency (J) and orientation (6): from each image were linearly scaled to conform to the required signal : mask contrast ratio. The data from the G(jY)=exp[ -S(f*>‘] combined images were then resealed to fill 256 gray levels expressed as X-bit integers. During the expe~mental procedure a modification of (1) the two-channel digital contrast control described by Watson, Nielsen, Poirson, Fitzhugh, Bilson, Nguyen and The polar separable filter design afforded independent Ahumada (1986) was used to map the image pixel data control of the spatial frequency and orientation content onto screen pixel luminances. The two channel technique of the images*: fpeakand epeakdetermine the characteristic gave the equivalent of 11.3 bit gray-level resolution. spatial frequency and orientation of the filter, while w enabling fme control of stimulus contrast right down to and /I control the spatial frequency and orientation detection threshold. For trials with masked stereograms, bandwidths. f3Fak,the orientation of the peak contrast the images were displayed so that the r.m.s. contrast of energy (i.e. the center of the filter’s passband) is the mask was always -30 dB (from unity contrast). measured from the horizontal axis. Hence an image filtered with epeak= 0” has maximum contrast energy in the horizontaf direction, producing a texture that has The stimuli were displayed on a Manitron VLR2044 predominantly vertical stripes. gray-scale display monitor (mean luminance 110 cd/m*, The filter parameters were chosen to be close to the max local amplitude contrast 84%). Each stereo image conditions tested by Mayhew and Frisby (1978).&,, was set to either 4, 8 or 12 c/deg, with w = 0.2, and p = 20”. pair was presented side-by-side on the monitor, each However, one important difference is that Mayhew and image within a square black outline which served as a Frisby (1978) used a filter that passed energy at two fixation marker and an aid to binocular fusion. For the main experiment, the stereoscopic images were viewed orthogonal orientations, whereas in these experiments via two front-silvered mirrors, one in front of each eye, only one orientation was used. With only one orientation component in the signal and mask images, a + 90” angled so as to bring the left and right images into range of the masking orientations could be tested. This binocular alignment. Prior to data collection, the mirrors were adjusted with the aid of a physical fixation marker compares with the k45” possible with two orthogonally oriented components. Mayhew and Frisby (1978) did so that the vergence and accommodation demand for binocular fusion were appropriate for the 2.20 m viewing report that stereopsis was weak with one-component distance. For trials with unpractised observers (see stereograms, but this was not a problem for the observDiscussion) the stereograms were displayed using a ers in this study. The likely reason is that the orientation haploscopic arrangement (with two mirrors in front of bandwidth (/.I) used here was broader, being 20” rather than 10” used by Mayhew and Frisby (1978). The 20” each eye). This apparatus proved easy to adjust for each observer so that the stereo images could be effortlessly fused. In all cases the observers’ head position was *A filter of similar design to that in equation (1) has been used to restrained by a forehead and chin rest. The trials were produce spatially filtered noise patterns in previous analyses of the properties of visual channels (Mostafavi & Sakrison, 1976). While run in a darkened laboratory. METHODS
xexp[ -~(~~I.
there is little evidence of polar separability in a two-dimensional spatial frequency analysis of simple cell receptive fields in area VI of cat visual cortex (Webster & DeValois, 1985; Jones, Stepnoski & Palmer, 1987), it is noted that for small orientation bandwidths this filter approaches the Fourier representation of a Gabor filter (Mansfield, 1990).
Subjects
Data are shown in detail for two subjects. Both were experienced stereo-psychophysics viewers, and wore standard optical corrections for their slight myopia. The
FIGURE i. Example bandpass filtered stereograms. Each stereopair contains a central square shaped disparate reg!ron ‘Ilus region is in a depth plane in front of the background when the images are free-fused by crossing the eyes. (A) Unmasked. bandpass-filtered random-dot stereogram, with 8,,, = 0’. Notice that, unlike an untiltered random-dot stereogram, the edges of the region in depth are indistinct. Despite this, the disparate region is still clearly seen depth. (8) The same stereo signal as shown in (A) combined with a binocularly uncorrelated mask with f),,, = 0’ The ratio of signal to mask contrasts is I : 1. It is difficult to determine the depth of the central region. (C) The same stereo signal as shown in (A) combined with a mask with OpeaL= 90‘. The ratio of the signal and mask contrasts is I : 1. The mask does not disrupt the stereo percept as much as it does in (B), and the disparity signal carried by the vertical components can be distinguished from the background.
ORIENTATION
TUNED
substantial aspects of the results here have been demonstrated with a third experienced observer (see Parker et al., 199 1) and with a group of eleven unpractised observers who were completely naive about the aims of the experiment (see Discussion).
MASKING
A -
-2o-
-60
-50
-40
signal contrast
-30
-20
(dB)
FIGURE 2. Psychometric functions from one observer for identifying the depth in signal stereograms filtered with O,,, = 90” and fWai = 4 c/deg. Three different masking conditions are shown: no mask (diamonds), mask with U,,, = 0’ (open circfes), and mask a& 90’ (solid circles). The dashed lines show the 75% threshold. Each data set is fitted with a cumulative Gaussian function using a maximum likelihood procedure.
JMH
g rr
Q
4
Q
cldeg
-25.
ii? u) sz -30. s G 2
A binary choice constant stimulus procedure was used to measure the signal contrast required for depth identification in the masked stereograms. In each block of trials, a range of signal contrasts was selected that spread either side of the expected threshold value. The sign of the disparity shift (either crossed or uncrossed} on each trial was determined randomly by the computer controlling the experiment. Each trial was initiated by the observer by pressing a button connected to the computer. The stimulus was displayed until the observer pressed either of two further buttons to indicate to the computer whether the disparate region in the stereogram was either “in front of” or “behind” the fixation plane. No limit was placed on the presentation time to ensure that the depth identification task would not be limited by the time taken to fuse the stereogram. Fusion times for spatially filtered, masked stereograms are unpredictable, especially given the potentially rivalrous nature of the masking pattern used here. As it was, for the majority of the experimental trials, the observers’ response was made within one second of the beginning of the presentation. The observers were given no feedback during each set of trials. The number of trials on which the observer indicated the correct disparity was recorded for each signal contrast. This data produced psychometric functions relating the probability of correct depth identification to the contrast of the signal. These psychometric functions were collected with 40 observations per point. Trials were run in a randomized counter balanced order for two observers, for the six mask orientations and with the stereo signal filtered with Bpcak= 0’ and Bprak= 90.. Each signal and mask combination was tested at three peak spatial frequencies, 4, 8, and 12 c/deg. Additional trials
I539
OF STEREOPSIS
-35-
E 8
-4o-
fi -90
a -60
-30
0
30
60
90
mask orientation (degrees) B
5
o-45
J
a 0
. 30
. 60
+ I so
1 120
I 150
‘ 160
mask orientation (degrees) FIGURE 3. Orientation tuning curves for the masking of stereo depth identification in bandpass-filtered stereograms. Contrast thresholds for 75% depth identi~cation are plotted against the 8,,, of the mask (mask contrast = -30dB re unity contrast). The signal and mask fbeilk = 4 c/deg. The signal Hprakis 0 in (A), and 90’ in (B) as indicated by arrows on each abscissa. Error bars show the 95% fiducial limits. The solid curves are the best fitting Gaussian functions defined by equation (2). Subject JMH.
were run at each frequency with no mask to provide a baseline measure of each observer’s contrast sensitivity for stereopsis.
RESULTS The psychometric functions were fitted with cumulative Gaussian functions (with the mean and standard deviation free to vdry) using the method of maximum likelihood (Watson, 1979). Fiducial limits for the best fitting parameters were determined using the procedure outlined by Watson and Pelh (1983). Figure 2 shows sample data from one observer for the 4 c/deg condition with the signal epeak= 90’. The graph shows psychometric functions for depth perception as a function of signal contrast in three cases: when there was no mask (diamonds)~ when the mask had @i”.L= 0’ (open circles) and when the mask had Opcah= 90’ (solid circles). These psychometric functions are approximately linear in the range 65-85%, and the slopes are roughly equal across all the conditions. Thus, the disruptiveness of the mask in each condition can be reasonably compared by considering the ratio of signal contrast to mask contrast that is required for a performance of 75% correct (indicated by the dashed line). Figures 3-7 summarize the masked threshold data. Each graph plots the contrast at 75% depth identification as a function of the mask orientation. In each case, the signal orientation is indicated by an arrow on the abscissa. The horizontai dashed lines in Figs 5, 6, and 7 indicate the unmasked contrast threshold (75%
I 540
J. S. MANSFIELD
and A. J. PARKER A
-
m
4ddeg
-25-
m
-15"
JSM
z. -0
-20"
2 i! -25s tij F -3o-
---__---_________*_______________
E -60
-90
-30
0
60
30
mask orhtation
90
8-35-E -90
(degrees)
I -80
I -30
+ I 0
I 30
, 60
1 90
mask orientation @agrees)
0
B
‘, 60
QO
150
120
180
o-45J 0-w
mask orientation (degrees)
mask orientation (degrees) FIGURE 4. See caption to Fig. 3. Subject JSM.
FIGURE 6. See caption to Fig. 5. Subject JSM.
correct) for stereopsis. In Figs 3 and 4, where fpeat = 4 c/deg, the unmasked thresholds are too low for the contrast range chosen for the ordinate. The unmasked threshold values are given in Table 1. The analysis of contrast thresholds for the perception of depth in unmasked stereograms is the subject of another paper (Mansfield & Simmons, 1993). The difference between the solid and dashed lines shows the elevation of contrast threshold caused by the mask. This elevation is greatest when the mask and signal are filtered with the same peak orientation. For fpeak=8 and 12 c/d eg, as the mask is rotated away from
the signal orientation, the masking of stereopsis is released until there is only a small masking effect when the signal and mask are orthogonal. Forf,,, = 4 c/deg, the same general trend can be observed. Again the mask is most disruptive when it has the same Bpeak as the signal. Indeed, of the spatial frequencies tested, this spatial frequency produced the largest threshold elevation. However, unlike at the higher spatial frequencies, there is a substantial component of the masking effect that is stilt present at the orientation orthogonal to that of the stereo signal. These results indicate that the contrast masking of stereopsis consists of two components: one that is orientation specific and another that is independent of the
A
I?
-2s
ro i! afj 6CL 8
---
-35.
-
_________
&
_
_."
+
-40 -Et0
-90
-30
0
306090
mask orientation (degrees) B e-. -25 !I =
-30
k
_4sJ
1
mask orientation (degrees) B
_____-___ 1
0
______________.
.____”
.
,
.
.
30
60
90
120
1
150
,
180
mask orientation (degrees) FIGURE 5. Details as for Fig. 3, but fvk = 8 c/deg. Dashed lines show the contrast threshold for stereopsis in unmasked stereograms. Subject JMH.
FIGURE 7. See caption to Fig. 5, but bar = 12 c/deg
ORIENTATiON
1541
TUNED MASKING OF STEREOPSIS
TABLE 1. Unmasked contrast thresholds (T,,), orientation dependent (ORI) and isotropic (ISO) components, orientation bandwidth (fl) in the contrast masking of stereopsis for two observers JMH
JSM
-IS0 (de)
T,,(dB)
8
ORI (dB) -
IS0 (dB)
-52.9 - 50.0
54” 27”
16.0 6.6
9.5 13.6
-0.6 4.1
-29.3 -37.8
21’ 29“
5.9 6.6
3.8 2.6
1.3 3.6
-31.0 -42.0
-
-
-
0s
T,,(dB)
B
&RI (dB)
4 c/deg
0” 90
-57.7 -53.5
40” 22”
19.0 8.4
18.0 15.6
8 c/deg
0” 90’
-34.0 -43.0
14” 29”
9.2 6.8
12cjdeg
0’ 90”
-36.1 -45.2
23” 18”
7.4 7.8
f pear
and
Parameters were obtained by fitting the masked threshold data with Gaussian functions. See text for details.
places an upper limit on the size of the binocular receptive field that can be used to detect the disparity. A mechanism with narrow orientation tuning will have a more elongated receptive field than one that has a broad (@In- 0’ 282 7’,,,(8,) = OR1 x exp IS0 + T,, (2) or isotropic orientation response. Thus in circumstances where the receptive field size approaches the dimensions where T,(&,) is the contrast threshold with the peak of the disparate region, broadly tuned, or isotropic mechanisms will provide the stronger disparity response. orientation of the mask set to 8,, 8, is the peak orienSuch an occurrence is more favored at lower spatial tation of the stereoscopic signal, T,,, is the unmasked contrast threshold for stereopsis, OR1 is the magnitude frequencies. The physical size of the disparate patch occupies about 1.7 cycles of a 4 c/deg grating. Thus, in of the orientation tuned component, fl is the orientation bandwidth of the OR1 component, and IS0 is the the 4 c/deg condition, the stereo channels with fine magnitude of the non-oriented masking component. The orientation tuning may be too large to signal reliably the Gaussian function was used as it allowed an easy depth carried by the disparate region. According to this estimation of the parameters of the masking curves line of argument, the only low spatial frequency channels without requiring any specific model for the masking that have small enough receptive fields to match the phenomenon. It provided a close fit to the masked physical size of the disparate region (and thus detect thresholds (on average, accounting for 87% of the reliably the disparity signal) necessarily have broad or variance in each data set). The best fitting parameters for isotropic orientation tuning. each masking curve are shown in Table 1. The data in Table 1 show differences in the measures mask orientation. To obtain estimates of the magnitudes of these components, the masked threshold data were fit with Gaussian functions of the following form:
1f
I..
2
DISCUSSION
Orientation
tuned masking
of stereopsis
The data shown in Figs 2-7 clearly demonstrate that the masking of stereopsis with our spatially filtered random-noise patterns does depend upon the relative orientations of the signal and mask patterns. When the signal and mask had the same dpeakthe threshold elevation caused by the mask was always greatest. If the mask orientation was then rotated away from the signal orientation, the thresholds for detection of stereo depth always decreased, suggesting that the relative orientation of test and mask makes the mask less effective. Across all the conditions tested, the magnitude of the orientation dependent threshold elevation was quite large (between 6.6 and 19.7dB). The data also indicate that, in addition to the orientation dependent component to the masking of stereopsis, there is an isotropic component, the magnitude of which depends on the stimulus spatial frequency. It is interesting to speculate on the cause of this orientationindependent masking. A likely explanation comes from the fact that the disparate region had the same physical dimensions (0.42 x 0.42”) for each of the different spatial frequency conditions. The size of the disparate region
of orientation bandwidth at different signal orientations. For fpeak = 4 c/deg, both observers show a much broader tuning to vertically (opeak= 0’) oriented stereo textures than they do for horizontal (epeak= 90”), whereas at 8 c/deg this situation is reversed. However, little significance can be attached to these differences, since the unmasked threshold data from both observers indicates that each had marked differences in contrast thresholds for detection of stereopsis with horizontal and vertical textures. Such an anisotropy potentially confounds any experimental effects that are dependent upon the signal 8 peak.It was precisely in anticipation of this problem that the masking study was performed for the two orthogonal signal orientations. It was essential that the orientation dependent masking effect should be observed with both horizontal and vertical orientations of the signal before conclusions about orientation tuning could be drawn. Comparisons
with other masking
studies
In the view of Frisby and Mayhew (1978), the masking
of stereopsis for fine disparities is strictly isotropic in the orientation domain. The results presented here show that there is a large orientation specific component, although there is also evidence for a non-specific component under some conditions. The difference between the current results and those of Mayhew and Frisby
1542
J. S. MANSFIELD
(1978) could possibly be attributed to various factors concerning the design of the masked stimulus and the psychophysical methods. This section attempts to summarize those differences. Construction qf stereograms. The methods used by Mayhew and Frisby (1978) to generate spatially filtered stereograms were significantly different from those used here. In this study, the left and right halves of a random-dot stereogram, containing a central disparate region, were each independently spatially filtered. In Mayhew and Frisby (1978), first a spatially filtered random-dot texture was generated, and then the disparate shift was added to copies of this filtered texture. The monocular contours introduced by the disparity shift were smoothed out using a cosine function to blur the disparate region into the background plane. Despite this blurring procedure, a monocular contour can often be seen around the perimeter of the disparate region in stereograms constructed by their method. Although such contours are not readily apparent in the left and right monocular images of the stereograms portrayed in their journal article, inspection of the phase part of the Fourier transform of images built in this way shows that a substantial proportion of the frequency components are not in random phase (Mansfield, 1990). Julesz and Oswald (1978) show how contours that are not monocularly salient might be exploited by binocular vergence mechanisms in stereopsis. It is not clear how the monocular contours in Mayhew and Frisby (1978) will have been affected by masks of different orientation. Nature of the masking stimuli. The type of mask employed in this study (where a mask was added to both the left and right stereohalves) was different from the mask used by Mayhew and Frisby who added a mask pattern to only one image. In the latter case, the contrast features in the mask pattern have no corresponding features in the non-masked image, a situation that would be likely to produce binocular rivalry. In the present study, where a masking pattern was added to both the left and right images, even though the masks are selected from different noise samples, some features in the left and right mask images fuse together, giving the percept of a randomly “bumpy” surface. This type of mask is comparable to the disparity mask of Richards (1972), and might be expected to interfere with the stereodepth identification task in a qualitatively different way from the more rivalrous mask. Expected size of masking ej%ct. The magnitude of the ORI component in the 8 c/deg condition [close to the 7.5 c/deg used by Mayhew and Frisby (1978)] ranged from 5.9 to 9.2 dB. For the stimuli used by Mayhew and Frisby (1978) where the range of masking orientations was limited to 45”, a smaller threshold elevation (between 3 and 5 dB) would be expected. Their estimation of the magnitude of the masking effect may have been insensitive to this difference. That this might have been the case is more likely considering that, depending on the conditions, there might also be a substantial nonoriented threshold elevation due to the mask.
and A. J. PARKER
Comparison with simple detection tu.6. A further point concerning masks with two components at different orientations is that for simple detection there is a difference in the efficacy of one- and two-component sine-wave grating masks (Derrington & Henning, 1989). Gratings that are symmetrically oriented on either side of a vertical signal grating substantially mask the detection of the signal, even when the signal and mask gratings differ in orientation by as much as 67.5 Thus, the orientation bandwidth of a mask pattern can be broadened by the introduction of a second oriented component. As Derrington and Henning (1989) acknowledge, there is at present no really adequate account of why this should occur. In Mayhew and Frisby’s study, for the condition where the mask and signal differed by 45’, the configuration of the signal and mask components is comparable to that used by Derrington and Henning (1989) and a similarly broad masking effect might be expected to have occurred. These considerations add to the general caution about interpreting the bandwidths of masking effects in terms of the bandwidths of a single underlying set of channels (see Graham, 1989). This is equally significant in respect of the masking bandwidths measured for stereo depth judgments in this paper. Perhaps the most consistent picture that can be assembled at the moment is that the present results cause us to view the masking of stereo and the masking of simple contrast detection as having stimulus specificities for orientation that are quite comparable. But it is quite a separate question to understand why masking of either stereo or detection should not conform in a straightforward way to the expectations of a multiple-channel model of early visual processing. An alteration
qf masking
spec$city
by learning:)
Another major difference between this study and that of Mayhew and Frisby (1978) lies in the experimental procedures employed. Here, the masking effects were investigated using a forced-choice depth identification procedure, as opposed to brief, subjective reports about the quality of stereopsis in the masked stereograms. This leads inevitably to differences in the number of stimulus presentations made to each observer. For this study, the number of observations made by each observer was just less than 10,000 (including training sessions). This compares to the eight presented in Mayhew and Frisby (1978)-although, presumably, a larger number was used when making their initial observations. It is possible that, during the series of forced-choice trials used in our study, the nature of the mechanisms mediating the depth identification task changed from an initial isotropic state to one that was orientation specific (Frisby & Pollard, 1991; see also Ramachandran & Braddick, 1973). It is impossible to ascertain from the forced-choice data whether there was an effect of practice during the course of data collection. It seems unlikely as the observers were not given feedback during the course of the experiment, and the stimulus contrast ranges determined during preliminary trials did not need to be adjusted later during the experimental trials. However, it is possible
ORIENTATION
TUNED
MASKING
that the observers may have learned how to use different cues in the images just because they were always searching for a form to identify. This might explain why, after performing the experimental trials, the observers in this study were no longer convinced by the demonstration stereograms in Mayhew and Frisby (1978) and it would imply that the masking effect found here is purely a laboratory phenomenon and does not reflect the processes involved in normal, every-day, stereo vision.
In order to investigate formally whether observers may have learned to use the orientation information during the large number of experimental presentations,* a further experiment was performed which sought to minimize the number of orientationally filtered stereograms presented to the observers. The orientation specificity of stereopsis masking was tested with eleven volunteers using a method of limits. Prior to experimental trials, it was necessary to train the observers on the experimental task. However, the experimental design demanded that no orientationally filtered images could be used in the training process. Instead, observers were required to identify the depth in a series of unfiltered and isotropically-filtered? random-dot stereograms. The training procedure, which lasted from 3 to 4 min. also served to ensure that the observers had adequate stereoscopic depth perception. At the beginning of each experimental run, observers were shown an orientation filtered stereogram of the type used in the main experiments, with the mask contrast set at -30 dB and the signal contrast at a suitably high level so that the observers were able to identify correctly the depth in the stereogram. On subsequent presentations, the contrast of the signal was reduced in 2 dB steps and the sign of the disparity (crossed or uncrossed) haphazardly selected by the experimenter. The observer was required to identify the depth in the stereogram. The contrast threshold was taken as the signal at which the observer first responded incorrectly, or when the observers reported that they were guessing. The procedure was repeated four times for each observer, twice with each signal orientation = 0 and 90°) each masked with Speak= 0 or 90”. The (@peak peak frequency was always 8c/deg. On average, each observer was presented with 15 experimental stimuli, and the entire procedure lasted approx. 12 min. With so few presentations it is difficult to explain any orientation dependent masking effects as having arisen from observers learning to use orientation-specific information in a way that would not be used in normal stereoscopic vision. In analyzing the data, the threshold estimates were combined into two conditions: (1) where the signal and mask had the same orientation and (2) where the signal *We would like to thank John Frisby and H&rich Biilthoff for encouraging us to test this possibility with our stimuli. tThe isotropic filter was similar in design to that described by equation (1) except the component in 0 and p was omitted.
1543
OF STEREOPSIS mask and signalat orthogonal orientations 0 same orientation
II 40 -
1
subject FIGURE 8. Histogram showing the contrast sensitivity for identifying the depth in masked stereograms. Data from eleven observers are shown for two different masking conditions, where the signal and mask orientations were either the same (open bars) or orthogonal (solid bars). The sensitivities for the orthogonal condition are on average 5 dB greater than when the mask and signal orientations are aligned (t = 10.67, 10 d.f.. P < 0.005).
and mask were orthogonal. The resulting thresholds are plotted in Fig. 8. Not surprisingly, the results are more variable than those presented earlier. For all but one observer, the higher contrast sensitivity is obtained when the signal and mask orientations differ by 90”. The mean difference over the eleven observers between the two masking conditions is 5.4 dB, and is significant with P < 0.005 (one tailed r-test, r = 3.67 with 10 deg of freedom). CONCLUSION This study shows that the contrast masking of stereoscopic depth detection is to some extent orientationtuned. The orientation tuned masking occurs at a wide range of spatial frequencies. At lower spatial frequencies the masking also includes a substantial non-oriented component. The orientation specificity of stereopsis masking has been demonstrated with both practised and unpractised observers. The results with naive unpractised observers are reassuring as it is common psychophysical practice to allow observers to become thoroughly familiar with the experimental procedure and stimuli, and to ensure that performance is constant, before collecting any formal data. In summary, our observations imply that the result of Mayhew and Frisby (1978) can no longer be taken as a clear indicator that local disparity matches are not established between o~entation tuned filters. Instead these data suggest a less significant divergence between the mechanisms of stereo depth perception in humans and the picture yielded by neurophysiological recordings from orientation-tuned disparity-sensitive units in the visual cortex of cats and monkeys. REFERENCES Barlow, H. B., Blakemore, C. & Pettigrew, J. D. (1967). The neural mechanism of binocular depth discrimination. ~~ur~uf Q/ ~~f~.~jo/~J~~, London, 193, 321-342.
I544
J. S. MANSFIELD
Blake, R. & Levinson, R. (1977). Spatial properties of binocular neurones in the human visual system. Experimental Brain Research, 27, 221l232. Blakemore, C. & Campbell, F. W. (1969). On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. Journal of Physiology, London, 203, 237-260. Blakemore, C. & Hague, B. (1972). Evidence for disparity detecting neurones in the human visual system. Journal ofPhysiology, London, 225, 437-445. Blakemore, C. & Julesz, B. (1971). Stereoscopic depth aftereffect produced without monocular cues. Science, 171, 282-288. Blakemore, C., Fiorentini, A. & Maffei, L. (1972). A second neural mechanism of binocular depth discrimination. Journal ofPhysiology, London, 226, 725740. Campbell. F. W. & Robson, J. G. (1968). Application of Fourier analysis to the visibility of gratings. Journal of Physiology, London, 197, 551-566. Derrington, A. M. Jc Henning, G. B. (1989). Some observations on the masking effects of two-dimensional stimuli. Vision Research, 29, 241-246. Felton, T. B., Richards, W. & Smith, R. A. Jr (1972). Disparity processing of spatial frequency in man. Journal of Physiology, London, 225, 349-362. Frisby, J. P. & Julesz, B. (1975a). Depth reduction effects in randomline stereograms. Perception, 4, 151-158. Frisby, J. P. & Julesz, B. (1975b). Effect of orientation difference on stereopsis as a function of line length. Perception, 4, 179-186. Frisby, J. P. & Pollard, S. B. (1991). Computational issues in solving the stereo correspondence problem. In Landy, M. S. & Movshon, J. A. (Eds), Computational models of visual processing. Cambridge, Mass.: MIT Press. Frisby, J. P. & Roth, B. R. (1971). Orientation of stimuli and binocular disparity coding. Quarterly Journal of Experimental Psychology, 23, 3677372. Graham, N. V. S. (1989). Visual pattern analyzers. Oxford: Oxford University Press, Grimson, W. E. L. (1981). A computer implementation of a theory of stereo vision. Philosophical Transactions of the Royal Society of London, B, 292, 217-253. Hawken, M. J. & Parker, A. J. (1984). Contrast sensitivity and orientation selectivity in lamina IV of the striate cortex of OldWorld monkeys. Experimental Brain Research, 54, 367-373. Hubel, D. H. & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in cat’s visual cortex. Journal of Physiology, London, 160, 106154. Hubel, D. H. & Wiesel, T. N. (1970). Stereoscopic vision in Macaque Monkey. Nature, London, 225, 41-42. Jones, J. P., Stepnowski, A. & Palmer, L. A. (1987). Two-dimensional spectral structure of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58, 1212-1232. Joshua, D. E. & Bishop, P. 0. (1970). Binocular single vision and depth discrimination. Receptive field disparities for central and peripheral vision and binocular interaction on peripheral single units in cat striate cortex. Experimental Brain Research, IO, 389-416. Julesz, B. (1971). Foundations of cyclopean perception. Chicago, Ill.: University of Chicago Press. Julesz, B. & Miller, J. E. (1975). Independent spatial frequency tuned channels in binocular fusion and rivalry. Perception, 4, 125-143. Julesz, B. & Oswald, H. P. (1978). Binocular utilisation of monocular cues that are undetectable monocularly. Perception, 7, 315-322. Koenderink, J. J. & Van Doom, A. J. (1976). Geometry of binocular vision and a model for stereopsis. Biological Cybernetics, 21, 2935. Livingstone, M. S. & Hubel, D. H. (1984). Anatomy and physiology of a color system in the primate visual cortex. Journal of Neuroscience, 4, 3099356. Mansfield, J. S. (1990). The influence of spatial contrast processing on human stereoscopic vision. D.Phil. thesis, University of Oxford, England. Mansfield, J. S. & Simmons D. R. (1993). Contrast thresholds for the identification of depth in bandpass-filtered stereograms. Perception & Psychophysics. Submitted.
and A. J. PARKER Marlowe, L. H. (1969). Orientation of contours and bmocular depth perception. Ph.D. thesis, Department of Psychology, Brown University, R.I., U.S.A. Marr, D. & Hildreth, E. (1980). Theory of edge detection. Proceedings of the Royal Society of London B, 207, 187-217. Mayhew, J. E. W. 8t Frisby, J. P. (1977). Global processes in stereopsis: Some comments on Ramachandran and Nelson (I 976). Perception, 6, 1955206. Mayhew, J. E. W. & Frisby, J. P. (1978). Stereopsis masking in humans is not orientationally tuned. Percepfion, 7, 431-436. Mayhew, J. E. W. & Frisby, J. P. (1979). Surfaces with steep variations in depth pose difficulties for orientation tuned disparity filters, Perception, 8, 691698. Mayhew, J. E. W. & Frisby, J. P. (1980). Psychophysical and computational studies towards a theory of human stereopsis. ArriJiciul Intelligence, 17, 349-385. Mitchell, D. E. & O’Hagan, S. (1972). Accuracy of stereoscopic localization of small line segments that differ in size or orientation for the two eyes. Vision Research, 12, 437454. Mostofavi, H. & Sakrison, D. J. (1976). Structure and properties of a single channel in the human visual system. Vision Research, 16, 957-968. Ohzawa, I. & Freeman, R. D. (1986). The binocular organization of complex cells in the cat’s visual cortex. Journal of Neurophysiology, 56, 243-259. Parker, A. J. & Hawken, M. J. (1985). Capabilities of monkey cortical cells in spatial resolution tasks. Journal of the Optical Society of America, A,Z, 1101-l 114. Parker, A. J., Johnston, E. B., Mansfield, J. S. & Yang, Y. (1991). Stereo, surfaces and shape. In Landy, M. S. & Movshon, J. A. (Eds), Computational models of visual processing. Cambridge, Mass.: MIT Press. Poggio, G. F. & Fischer, B. (1977). Binocular interaction and depth sensitivity in striate cortical neurones of behaving rhesus monkey. Journal of Neurophysiology, 40, 1392-l 405. Ramachandran, V. F. & Braddick, 0. J. (1973). Orientation-specific learning in stereopsis. Perception, 2, 371-376. Richards, W. (1972). Disparity masking. Vision Research, 12, 1113--1124. Rogers, B. J. & Cagenello, R. B. (1989). Disparity curvature and the perception of three-dimensional surfaces. Nature, 339, 135-137. Schneider, B., Moraglia, G. & Jepson, A. (1989). Binocular unmasking: An analog to binaural unmasking? Science, 243, 147991481. Watson, A. B. (1979). Probability summation over time. Vision Research, 19, 515-522. Watson, A. B. & Pelli, D. G. (1983). QUEST: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33, 113-120. Watson, A. B., Nielson, K. R. K., Poirson, A., Fitzhugh, A., Bilson, A., Nguyen, K. & Ahumada, A. J. Jr (1986). Use of a raster framebuffer in vision research. Behavior Research Methods, Instruments and Computers, 18, 587-594. Webster, M. A. & DeValois, R. L. (1985). Relationship between spatial frequency and orientation tuning of striate cortex cells. Journal of the Optical Society of America, A,2, 1124-l 132. Westendorf, D. H. & Fox, R. (1975). Binocular detection of vertical and horizontal line segments. Vision Research, 15, 471476. Wilson, H. R. & Bergen, J. R. (1979). A four mechanism model for threshold spatial vision. Vision Research, I9, 19-32. Yang, Y. & Blake, R. (1991). Spatial frequency masking of human stereopsis. Vision Research, 31, 1177-l 189.
Acknowledgemenls-This research formed part of a D.Phil. thesis submitted to the University of Oxford and was supported by the Science and Engineering Research Council (CR/D/64193 and GR/E/46240) and a major equipment grant from the Wellcome Trust. We would like to thank Julie Harris for her participation as an observer and John Mittell for his assistance in implementing the two-channel digital contrast control. An initial report of some of these results was given at the Annual Meeting of the Association for Research in Vision and Ophthalmology, Sarasota, Fla. 29 April 1991.