Vision Res. Vol. 33, No. 516, pp. 827-838, 1993 Printed in Great Britain. All rights reserved
Copyright
0
0042-6989/93 $6.00 + 0.00 1993 Pergamon Press Ltd
Effects of Different Texture Cues on Curved Surfaces Viewed Stereoscopically B. G. GUMMING,*
E. B. JOHNSTON,*?
A. J. PARKER*
Received 28 January 1992; in revisedform 8 September 1992
Stereoscopic shape judgements can be modified by the addition of texture cues. This paper examines the properties of texture that are responsible for this effect. When a three-dimensional curved surface is projected onto a two-dimensional image, changes in surface orientation result in gradients of texture element size (or area), shape (compression) and density in the image. Manipulating each of these gradients independently we found that 97% of the variance in the results could be accounted for by the compression gradient. When the texture pattern corresponds to a highly anisotropic texture on the object’s surface, shape-from-texture becomes ineffective. These results suggest that human shape-from-texture proceeds under the assumption that textures are statistically isotropic, and not that they are homogeneous. Stereopsis Shape-from-texture
Three-dimensional shape perception
study examined which of these variations is most salient for human shape-from-texture when curved objects are viewed at distances which are large relative to the size of the object.
INTRODUCTION Forty years ago Gibson (1950) pointed out that information about the three-dimensional layout of a surface can be deduced from the projection of its texture markings onto a two-dimensional image. Gibson proposed that this cue, referred to here as shupefrom-texture, was used to derive a measure of surface slant, and subsequently the majority of psychophysical studies have used images representing planar surfaces (Braunstein, 1976). When a surface is projected, there are two distinct processes which make shape-fromtexture possible-perspective and foreshortening. Perspective projection of extended surfaces produces variation in the local scale of the projected texture (as the surface recedes, markings become smaller and more closely spaced). Even under parallel projection, the orientation of the surface relative to the projection plane produces foreshortening of the texture. Curved surfaces introduce changes in foreshortening within an image, producing an impression of curvature (Todd & Akerstrom, 1987). An earlier study of curved surfaces used stimuli with substantial perspective cues in addition to those resulting from changes in surface orientation (Cutting & Millard, 1984). For many objects, however, their extent in depth is small relative to the viewing distance, so shape-from-texture relies chiefly upon foreshortening, rather than perspective effects. In images of curved surfaces, foreshortening results in variations in the size, density and shape, of texture markings. This *UniversityLaboratoryof Physiology,Parks Road, Oxford OX1 3pT, England. ?Present address: Sarah Lawrence College, Bronxville, NY 10708, U.S.A.
The projection of texture onto images
When discussing shape-from-texture, it is important to distinguish the texture on an object’s surface (surface texture), from the texture that arises in an image of the object (image texture). The way in which surface texture maps onto image texture contains information about the surface shape. Consider a three-dimensional shape cut from a block of textured material, such as a clear plastic solid filled with balls of different colours. The surface of the object will be covered with approximately circular texture elements (texels). The image will contain variations in texel area, density and compression (the height/width ratio of the projection of texels, in this case ellipses), each of which contains information about three-dimensional shape. Figure 1 shows computergenerated images modelled upon this sort of texture, using the method described in the preceding paper (Johnston, Cumming & Parker, 1993; introduced by Biilthoff & Mallott, 1988). The texture shown in Fig. 1 provides a strong texture cue, as the variations across the image in texel area, density and shape are all a consequence of projection and not a property of the object texture. However, if there is some systematic variation in the properties of the object texture, e.g. if the textured block is comljosed of elements which increase in size towards the edges, then the assumption that image texel variations are caused by the projection of a homogeneous surface texture would result in misperception of the surface orientation. Thus, in order to interpret image texture variations in terms of
827
828
B. G. CUMMING
underlying surface orientation (shape-from-texture) it is necessary to make assumptions about the properties of the surface texture. Using assumptions only about the statistical properties of surface texture, it is theoretically possible to reconstruct the surface orientation from the image texture alone, as variations in the statistics of the image texture can be attributed to the projection of the surface onto the image plane. Two frequently made assumptions (well reviewed by Blake & Marinos, 1990) are that the surface texture is homogeneous and that it is isotropic. “Homogeneity” assumes that the statistical properties of local patches are uniform over the surface, e.g. the assumption, used by Gibson, that the density of texels is approximately constant over the surface. “Isotropy” assumes that surface markings have no
et al.
orientation bias-line segments are equally likely to be oriented in all directions. Note that if a surface contains non-oriented markings (such as circles) it will necessarily be isotropic, although the projected image will not necessarily be isotropic (the circles project to ellipses). Natural and digital texture generation
Whether either assumption is valid for a particular surface depends upon how that surface came into being. Even an informal analysis shows that a number of different natural processes may generate surface textures with a variety of statistical properties. Deposited texture. In some cases texture markings develop on a surface that already exists, as
A
FIGURE 1. Two example stereograms, presented for cross-eyed fusion. (A) Cylinder with correct texture variation (with changes in compression, density, and area). (B) Cylinder with no texture variation. In the experiment these were shown against a background made from the same texture block, but here this has been removed for clarity.
TEXTURE CUES ON CURVED SURFACES
happens when pebbles are scattered on a beach. Such textures are usually isotropic but not necessarily homogen~us (e.g. the stones higher up the beach tend to be larger). Surface distortion. Some surface textures become modified as the surface upon which they lie changes. Consider, for example, the spots on a Dalmatian’s back, or the stripes on a Zebra. These may be dist~buted homogeneously at some early stage, but the differential growth of different body parts give rise to variations in element size and density in an adult animal (Bard, 1977). Such textures may be neither isotropic nor homogeneous. ~o~~~iric texture. For objects that are cut out of a solid material (such as rocks, or wooden objects), the statistical properties of the surface texture will depend upon texture contained in the solid (the volume texture). A homogeneous and isotropic volume such as granite, gives rise to homogeneous and isotropic surface textures. However substances such as marble or wood, with anisotropic volume texture, will give rise to anisotropic surface textures. This makes it clear that there are many natural textures that are neither homogeneous nor isotropic, and of course there may be other processes or combinations of processes that generate texture. Nonetheless it is still possible to calculate surface shape from texture information provided that the surface texture does not vary systematically in a way that mimics the effects of projection (Witkin, 198 1). Psychophysical studies of shape-from-texture (Todd & Akerstrom, 1987; Cutting & Millard, 1984; Blake, Bfilthoff & Sheinberg, 1993; Buckley, Frisby & Spivey, 1991) using computer-generated stimuli, have generally used textures that are both isotropic and homogeneous, generated in ways quite different from natural texture formation. There are several ways to model the projection of texture on a computer. When simulating planar surfaces it is sufficient simply to calculate the projection of each texture element given the orientation of the plane. Studies using curved surfaces have often used the same technique, treating the surface as if it were locally planar across each texture element (Cutting & Millard, 1984; Todd & Akerstrom, 1987). Unless highly curved surfaces with large texture elements are used, this is a good approximation, but it can be avoided altogether. If the three~imensional co-ordinates of a surface and of all its texture markings are defined initially, then simply calculating a perspective projection produces an image which faithfully reproduces the projection of that surface texture into the image. In this work the texture was generated with volumetric rendering (see Methods), which offers a geometrically consistent method for mapping texture onto any surface. Since an isotropic/homogeneous volume gives rise to isotropic/homogeneous surface texture, the surface textures used are similar to those generated by the other methods. Both methods
829
produce similar variations in the compression, area, and density of image texture elements, which reflect changes in surface orientation. Psychophysical studies with texture alone
Under perspective projection, each of the three cuesvariations in compression, area or density-is capable of specifying surface orientation by itself. Cutting and Millard (1984) investigated which of these variations was perceptually relevant for defining surface layout. In judging whether a ground plane appeared to recede into the distance, changes in texel size and density were important, but judgements of whether the receding plane was flat or curved depended almost exclusively on detecting changes in compression. This result is often taken to suggest that the stragegy used to make judgements about flat surfaces is different from that used to make judgements about curved surfaces. However, the first task depends upon detecting the perspective in the image, while the second task revolves around differences in local surface orientation. From this point of view it is quite reasonable that different sources of information are used to detect these two different projective distortions of surface texture, as they are specified by different image properties. Under orthography projection, changes in the area and density of image texels only occur as a result of surface curvature and they cannot be used to calculate local surface orientation. Using stimuli in which image texture was largely determined by local surface orientation (with small perspective effects), Todd and Akerstrom (1987) also found support for compression as the primary component of the texture cue. Their subjects made depth judgements of monocularly viewed ellipsoids defined only by texture. They reported maximum depth perception when all three gradients were present. Although they found that removing the gradient of texel area decreased perceived depth by 25%, they found no effect of area gradients in the absence of compression. These results cannot be explained as a simple linear summation of two independent cues (compression and area). This led them to postulate an alternative shape-from-texture metric which results in a measure very similar to compression but takes some account of changes in area. Combining texture and stereo
Both studies used monocular viewing of images depicting three-dimensional surfaces, specified by texture variation alone. As Todd and Akerstrom acknowledge, this seldom occurs in natural vision, and it is important to consider how shape-from-texture may be affected by other depth cues. The present study investigated how the different components of the texture cue contribute to the r~onst~~tion of three-dimensional surfaces in combination with binocular stereo cues defining a similar surface. Since stereopsis produces a compelling impression of three-dimensional shape, it may place strong constraints on the interpretation of a texture cue, producing different results from monocular experiments.
830
B. G. CUMMING
The compelling sensation of depth produced by stereopsis is important for a second reason. Although subjects can interpret the depth portrayed by a single image containing texture, the impression of depth is not as robust as that produced by stereopsis. It may be that subjective depth judgement tasks can be performed on single textured images because subjects understand the cultural conventions used to depict depth pictorially in two-dimensional images. Thus monocular tasks may explore this understanding, whilst binocular stimuli may stimulate different perceptual processes (which are normally available to the observer). Although the sensation of depth produced by stereopsis is perceptually compelling, it is not always veridical. Johnston (1991) reported systematic distortions of shape-from-stereopsis, depth being substantially underestimated at far viewing distances (> 100 cm). Subsequently these judgements were found to be somewhat closer to veridical in the presence of a texture cue (Johnston et al., 1991, 1993). This interaction between stereo and texture, in the specification of threedimensional shape, was used here as a tool to explore the effectiveness of different components of the texture cue. The role of cognitive judgements, based on twodimensional properties of the image texture, is minimized by using stimuli in which the shape portrayed by texture is always the same, although the type of texture used to portray that shape is varied. We examined responses to stimuli in which texel compression, density, and area were manipulated independently, in order to determine which of these cues are used by the human visual system when reconstructing shape-from-texture, in the presence of stereo disparities. GENERAL METHODS Stimulus generation and apparatus
The stimuli portrayed horizontal elliptical hemicylinders. In order to produce an appropriate texture cue, the stimuli were generated by ray-casting with a volumetric texture representation, as described in Johnston et al. (1993). This method produces an exact perspective projection of a three-dimensional surface with texture generated in a way that mimics one form of natural texture generation (when a surface is carved from a solid textured material). The stimuli were presented on a Manitron VLR2044 Monitor with 1192 x 900 pixels available for display, each of which subtended 0.46’ at the 200 cm viewing distance used. At this distance, the voxels of the texture block subtended 0.72’, and since the depth of the cylinders (largest 15 cm) was small relative to the viewing distance, there were only small variations in the angular subtense of these voxels. The relatively small size of the voxels, and the interpolation between voxels, helped to ensure that image quality was not significantly limited by the volumetric representation. The texture used for these experiments consisted of spheres assigned a random grey-level, location and radius (in the range 2.9-7.2’ at 200 cm).
et al.
The stereograms were displayed in the central portion of the screen, which was geometrically linear measured by the technique of Maloney and Koh (1988). The display was adjusted for linearity in the luminance domain by the method of Watson, Nielson, Poirson, Fitzhugh, Bilson, Nguyen and Ahumada (1986). The stimuli were viewed through a modified Wheatstone stereoscope as described in Johnston et al. (1993). Stereo pairs were produced by tracing two images from viewpoints horizontally displaced by one inter-ocular separation. Images were generated individually for each subject, so that the disparity field was correct for that subject’s inter-ocular separation. The cylinders portrayed had a vertical diameter of 10 cm, and a width of IOcm, thus each image subtended an angle of 2.86”, horizontally and vertically, at the 200 cm viewing distance used. Figure 1 shows example stereograms. The stimuli are shown with a white background for the purposes of illustration, but in the experiment the stimuli were always presented against a background cut from the same textured block. Procedure
The images portrayed elliptical hemicylinders emerging from a plane as described in the preceding paper (Johnston et al., 1993). Subjects judged, in a binary forced-choice task, whether the depth of the cylinder appeared greater than, or less than its half height. Although the task could conceivably be performed by estimating the depth and the half height separately, subjects typically made a judgement of the cylinder’s overall shape-whether it appeared elongated or flattened relative to a circular cylinder. This task is identical to that used by Johnston (1991). A one up-one down multiple staircase procedure, described in Johnston et al. (1993), was used to determine the point at which cylinders appeared to have a circular cross-section. Two staircases (one starting with a flattened cylinder, the other with an elongated cylinder) were interwoven at random (Cornsweet, 1962). Twelve reversals per data point were collected, and their mean used as an estimate of the point at which a cylinder appeared elongated on 50% of trials. At this point the portrayed depth appears equal to the portrayed height (the cylinder appears circular), and we used the ratio of portrayed depth/height to quantify the shape distortion. Since the task involves a judgement about the overall shape of the cylinder, information from different parts of the cylinder, and from different depth modules (i.e. stereo and texture) can be integrated in making the judgements. The method of stimulus generation described above provides a texture variation which is always commensurate with the shape described by stereopsis. AS discussed in the accompanying paper, we manipulated the texture cue independently of stereopsis by varying the way in which the volumetric texture was scaled. This method does not eliminate changes in element size owing to the perspective projection. However, since the depth of the cylinders (maximum 15 cm) was small relative to the
TEXTURE CUES ON CURVED SURFACES
viewing distance (200 cm), this effect was negligible (< 10% change in dimensions). In the accompanying paper (Johnston et al., 1993), we found that perceived depth is enhanced when a texture cue is included in stereograms, relative to stereograms with no texture cue (i.e. the texture cue specifies a flat plane). Throughout these experiments the surface portrayed by the texture cue was kept constant (although different components of the texture cue were used to portray this depth), while the depth specified by stereopsis varied. The dimensions of the surface depicted by texture were determined by first measuring the depth/height ratio for each subject using commensurate texture and stereo cues (i.e. repeating the measure for a Texture~Stereo ratio of 1 described in Johnston et al., 1992). For example, if the cylinder appeared circular when stereo and texture both depicted a cylinder with a depth of 1Ocm and a half height of 5 cm, then the texture cue in all subsequent stimuli depicted a cylinder with these physical dimensions. During each experimental run, therefore, the different cylinders could not be discriminated on the basis of changes in the depth signalled by any component of the texture cue present. However, the texture cue present in all cylinders influenced the shape judgement, and the magnitude of this influence was measured by differences in the binocular stereo disparities required to perceive a circular cylinder. Subjects Four observers took part: the first two authors, and two others who knew nothing about the design and aims of the experiment. All subjects wore appropriate optical corrections. EXPERIMENT
1
We used the effect of texture in binocular viewing to investigate the effectiveness of variations in texel area, density and compression, as cues for shape-from-texture. Although the shape of the surface specified by texture remained constant throughout the experiment, the component texture cues which specified that surface (variations in the compression, area, and density of the image texels) were eliminated independently. The properties of the image texture were controlled by altering the properties of the volumetric texture.
Mani)wlation of components of the texture cue Because the surface portrayed by texture was always the same shape, there was a constant relationship between the y co-ordinate (height) of a voxel and the orientation of the cylinder’s surface at that point (as far as texture was concerned). From this slope we calculate the degree of foreshortening k, that would be produced in the image at that point 1 k=
1.
Jl+A l-y2
831
where R is the ratio of the cylinder’s depth to its height, and y ranges & 1 across the cylinder. We used the reciprocal of k to modify the volume texture, resulting in selective removal of some of the variations in image texture. Three components of the texture cue were manipulated independently. Compression. Each sphere in the texture block was turned into an ellipsoid whose long axis was rotated so that it lay parallel to the surface tangent of the cylinder (see Fig. 2). The degree of elongation was proportional to l/k, and the minor axes of the ellipsoids were adjusted so that their volume remained constant. On the surface of the cylinder, this produced a set of ellipses whose elongation increased systematically with surface slant. The projection of this texture onto the image plane produced circles (approximately), irrespective of surface orientation. Therefore the compression cue was absent. Density. The number of spheres per unit volume of the block was manipulated, so that the density of volume texture was proportional to I/k. This was done by deleting spheres from the list describing a homogeneous block. The cylinder’s surface texture became sparser as slant increased, so that the density of texture elements in the image remained constant. Area. The cross-sectional area of each sphere in the block was increased by factor l/k (i.e. the radius of was multiplied by 41/k). The resultant increase in element volume would also produce an increase in the numerical density of elements on the surface, which was avoided by an appropriate reduction in element density. The surface texture of the cylinder contained circles which became larger as the surface slant increased. This produced an image in which the mean area of texture elements in the image was constant, removing any cue from changing element area. Although these manipulations removed the large gradients in texel area and density which result from foreshortening, a small gradient remained due to the perspective projection. However this always produced changes in area and density that were < 10%. Each of these manipulations could be performed alone, or in combination with any of the other three, so we were able to examine all eight possible combinations of the three cues. Examples of each of the stimuli are shown in Figs 1, 2 and 3. Results Data were collected using sets of stimuli containing the eight possible combinations of the three cues. Figure 4 shows the results for each subject in~vidually. A clear pattern can be seen: stimuli that contain changes in compression produce the same impression of depth as the stimulus with all texture cues, regardless of the density and area gradients. The stimuli devoid of compression gradients produce an impression of depth
832
B. G. GUMMING et al.
which is very similar to that from stimuli with no textural variation at all. For these stimuli, larger disparities were required in order to perceive the cylinder as circular (depth/height is larger for less effective depth stimuli). This effect can be appreciated monocularly from the example images of Fig. 2: the stimulus on the left with only the compression cue appears to have more depth than the stimulus on the right, which contains only the density and area cues. Since there are differences between subjects in the magnitudes both of the distortion of shape-fromstereopsis (Johnston, 1990), and in the effect of texture on shape perception (Johnston et al., 1993), some further analysis was necessary in order to compare Ihe effect of texture across subjects. A relative measure of effectiveness for different textures was calculated as follows: let d, represent the depth/height ratio when all volumetric texture cues were present, do the ratio when the texture cue specified a flat plane, and d, the ratio for the ith set
of experimental images. We can then calculate a relative texture effect for each condition, d. - d,,
relative effectiveness = I
d, - do
(2)
which measures the strength of shape-from-texture in stimulus i, relative to the difference between stimuli with a full texture cue and those with a flat texture cue. A relative effectiveness of 1.0 indicates a stimulus as effective as one containing all texture cues, and a relative effectiveness of 0.0 corresponds to a stimulus whose texture cue specifies a flat plane. Figure 5 re-plots the data of Fig. 4 showing the relative effectiveness averaged across subjects. Because our task involves measuring a tradeoff between texture and stereo, and the effect of texture is small, it is hard to exclude a small contribution from texel size or density. In order to test for the possible existence of such effects with greater confidence, we used analysis of variance,
FIGURE 2. Figure showing modified volume textures (A, B), and resulting single images (C, D). (A) A block with volume texture that becomes sparser and larger towards the edges. This produces an image (C), that still has compression gradients, but has no variation in mean area or density. (B) A block with ellipsoids oriented along the surface of the cylinder. The resulting image (D), has no changes in compression, but changes in texel density and area.
TEXTURE
CUES ON CURVED
TABLE 1. Results of analysis of variance Factor Compression Subject Density Area
Variance ratio 183.8 103.1 5.49 0.03
Variance ratios are shown for the main effect of each of the four factors. = 3.88; P < 0.05. F,,352 = 6.73; P f 0.01. F,,352
pooling all of the data. The variance ratios for the main effects of the four factors (subject, compression, density and area) are shown in Table 1. The variance is almost entirely attributable to the compression cue (63%) and inter-subject variation (35%). Neither density nor area makes a contribution significant at the 1% level. After discounting the variance arising from inter-
SURFACES
833
subject differences, the presence or absence of the compression cue accounted for 97% of the remaining variance. If changes in texel area had an effect as small as 4% of the effect of compression, it would have had a variance ratio significant at the 1% level (F,,352= 6.53, P < 0.01). Thus we can certainly exclude an effect of texel area as large as that reported by Todd and Akerstrom (1987). Discussion
In studies using only texture cues (Cutting & Millard, 1984; Todd & Akerstrom, 1987), it was found that changes in the shape of texture elements provided the strongest texture cue for producing sensations of surface curvature and that changes in size and density of texture elements have relatively little effect. Our results extend this finding to stimuli that contain binocular disparities. Also, like Cutting and Millard (1984) we found no
B
FIGURE 3. Example stimuli showing four further combinations of texture cues. (A) Density alone; (B) area alone; (C) density and compression without area; (D) area and compression without density. Note how the images in the bottom row, which contain the compression cue, produce a stronger sensation of depth than those in the top row.
B. G. CUMMING
834
Compression Area Density
+ + +
+ +
’
+ + +
+
+ +
FIGURE
4. Data for individual subjects (each shown by a different bar style), showing the effects of different texture cues on binocular shape judgements. The cues present are indicated by the crosses below, and illustrated schematically above the data points. The abscissa plots the ratio of the depth specified by stereo to the height of the cylinder, when it appeared circular. Since this ratio is inversely proportional to the extent of perceived depth, reciprocal axes are used.
measurable effect of area. On the other hand, Todd and Akerstrom (1987) found that images in which all elements had a constant area were less effective than those containing area and compression cues, although they did not find any effect of changes in area without compression gradients. We found that stimuli with texels of constant area were just as effective as those that had a gradient of area. This may be one feature of shape-from-texture which is altered by the presence of a surface defined by disparities. One explanation for the reduced depth percept in Todd and Akerstrom’s constant area stimuli is that in addition to removing the gradient of texel area they also removed any gradient of texel density (see their Fig. 8). This manipulation resulted in stimuli with very few texels on the steeply curved outer edges of the ellipsoid. Suppose that the impression of depth in a monocular task involves estimating surface slope at each
Comgreasion
+
+
Area + Density
+
+
+
+
+
+
FIGURE 5. RIIecta of Merent texture patterns on binocular shape judgements. This shows the results averaged across subjects, expressed as a fraction of the e&cts of the full texture cue, as described by equation (2). A relative effectiveness of 1.0 is defined as the effectiveness of the full texture cue, while a relative effectiveness of 0.0 signifies the effect produced by a texture cue specifying a flat plane. The schematic across the top of the figure illustrates which cues were present.
et al.
texel, and then integrating this over the set of texels. Reducing the number of elements over which this integration is possible may then reduce the sensation of depth. In the presence of stereo disparities, however, each texel sits at a well defined position in depth, so its depth does not need to be estimated by integrating slope across surrounding texels. Todd and Akerstrom proposed an alternative metric, rather than simple compression, as a means of estimating shape-from-texture, which took into account texel area in addition to compression. Although their measure was similar to compression, Todd and Akerstrom were able to devise stimuli with no changes in compression (no compression gradient) which did have the appearance of depth. However, these stimuli did contain compressed texels, so that the compression cue alone still specified a surface with significant depth, but the surface corresponded to a cone-like object rather than an object like an ellipsoid, which is curved in depth. Since their task involved estimating depth, as opposed to curvature. the appearance of depth in these images may still be attributable to texel compression. EXPERIMENT
2
In Expt 1, the compression cue always arose from the projection of a circular element on the cylinder’s surface, onto an ellipse in the image. If the subjects assumed that the surface was covered with circles, it would be possible to calculate local surface orientation at each texel, from the elongation of the ellipse in the image. However, when the surface texture itself contains ellipses of various aspect ratios, the shape of a single ellipse in the image does not allow the calculation of surface orientation. Instead, compression must be calculated from the image by looking at the global pattern of ellipses, or at least average the compression over a local group of ellipses. How large a group would be appropriate would depend upon the range of elongations present in the unprojected texture, which could be deduced by the subjects since the plane behind the cylinder gives a sample of the unprojected texture. To examine whether local or global compression measures were used, Todd and Akerstrom (1987) compared stimuli with square surface texels to those with rectangular surface texels. They found very similar results with the two stimulus sets, suggesting that shape-from-texture operated on some “global level of image structure”. Three subjects participated in this experiment which examined whether they used a global or local solution for shape-fromtexture, using textures in which a range of random variations were introduced to the shapes of surface texels. Method
The method for generating images was the same as that used in the previous experiment, but a different set of volume textures was used, filled with ellipsoids. This produces ellipses on the surface of the cylinders before projection. Since the ellipses in the image of a horizontal
TEXTURE CUES ON CURVED SURFACES
cylinder all have their long axes horizontal, the ellipsoids in these volumes all had their long axes horizontal. This ensures that the ellipticity of any single texel in the image is not a reliable gauge of surface o~entation. An example image [Fig. 6(A, B)] illustrates the point: along any horizontal line there is no change in surface orientation, but a range of ellipticities appear in the image. When generating these blocks, each ellipsoidal element was assigned a variable aspect ratio (ratio of major axis to minor axis) drzwn randomly from a uniform distribution. Four different blocks were generated, with different ranges of aspect ratios, producing different degrees of disruption to local surface slant measures. In each block, the maximum aspect ratio was different (1.5, 2.0, 2.5 and 3.0), while the minimum aspect ratio was always fixed at 1.O(spherical). Since the smallest aspect ratio was fixed, stimuli with a larger range of ellipsoid elongations also had larger mean elongations. Consequently, in addition to producing random variations in texel compression, this method also produced surface textures that were not isotropic. In order to generate an isotropic surface texture containing ellipsoids, we used a fifth volume texture (also with a maximum aspect ratio of 3.0), in which the orientation of each ellipsoid was randomized. This texture still requires subjects to use global measures for extracting compression gradients, but the surface texture remains isotropic. Figure 6(C) shows an example stimulus made from this block. The same procedure, and the same three subjects, were used as in Expt 1.
The depth/height of the apparently circular cylinder for each subject under each condition is shown in Fig. 7, and the average across subjects, using the relative effectiveness measure defined in equation (2), is plotted in Fig. 8. Although the overall gradient of texture element compression is the same in all five experimental conditions, there are clear differences in the effectiveness of the texture cue. As the elongation and variability of volume texture elements is increased, the contribution of shape-from-texture diminishes. The effect can be appreciated monocularly from the sample images shown in Fig. 6. With the least irregular of these blocks the mean effectiveness was 0.97, indicating that perceived shape was very similar to that produced by a regular texture cue [effectiveness of 1.00, defined by equation (211. In the case with the greatest elongation, the relative effectiveness (averaged across subjects) was 0.36. The differences in effectiveness between these stimuli suggest that curvature is not calculated simply from the mean global compression gradient, since this is the same in all cases. It may be that subjects are not able to extract the mean compression gradient in the presence of random variation. Alternatively, it may be the anisotropy of the surface texture (because of the increase in mean elongation) that has disrupted perception of shapefrom-texture. The data collected with randomly oriented VR331-1
835
A
B
C
Examplestimuliin whichrandom variationwas added to texelccjmpression,beforeprojection.(A) Maximumelongation1S; (B) maxirnum elongation3.0; (C) maximumelongation3.0, randoIrn orientation. FIGURE 6.
836
B. G. CUMMING BGC 0
Random Orientation
EBJ W JMH b%
3.0
J
1.5
2.0
2.5
Elongation
3.0
Ratio
FIGURE 7. Effect of random variation in texel shape on binocular shape-from-texture. Data are shown for individual subjects plotted as depth/height ratio of the apparently circular cylinder (as in Fig. 4). Since this ratio is inversely proportional to perceived depth, reciprocal axes are used. The stimuli were generated from blocks containing ellipsoidal texture elements, of random ellipticity, instead of spheres. Elongation ratio plots the maximum elongation of the ellipsoids in the block. The final data set shows the results of using ellipsoids which were randomly oriented, all other data used ellipsoids oriented horizontally. Each subject’s data is plotted with a different bar style.
ellipses on the cylinder’s surface helps distinguish between these two possibilities-these stimuli still have random variation in texel ellipticity, but are derived from isotropic surface textures. Comparing stimuli with the same degree of random variation, the texture cue was more effective (relative effectiveness 0.92) when the texture was isotropic than when it was anisotropic (relative effectiveness 0.36). This suggests that the disruption is due largely to the anisotropy of the surface texture, rather than the random variation in compression. Taken together, these results suggest that curvature is calculated from the overall gradient of texel compression in the image, rather
Horizontal Random
2
Orientation
0
Orientation
n 1
1.0.
0
B
.s 5 0
z
w
0.5
z ‘;: a
7
vi 0.0
1.5
2.0
2.3
Elongation
3.0
3.0
Ratio
FIGURE 8. Effects of random variation in texel shape, data averaged across subjects (similar to Fig. 5). The hollow bars show data with horizontally oriented ellipsoids, plotted against the maximum degree of elongation. The solid bar shows the effect when the elongated ellipsoids were assigned a random orientation.
et al.
than by first extracting local surface orientation from the compression of single texels. However, as increasingly anisotropic surface textures are used, the compression gradient becomes less effective as a cue to surface curvature. At least part of the reduced effectiveness of the compression cue can be explained in terms of changes in the available information. In order to estimate surface slant in regions of the image, it is first necessary to obtain an estimate of the anisotropy of the unprojected texture. AS the variation in ellipsoid elongation is increased, larger samples are required to estimate both the underlying distribution, and the mean compression over a local area. Thus the information available to calculate curvature from compression gradients is reduced by the random variation we introduced. Note that the density and area gradients are unaffected, so the reduced effectiveness of shape-from-texture with these stimuli is further evidence that density and size gradients are less important than compression gradients. Even with the most anisotropic stimulus, the relative effectiveness was greater than zero (it was 0.36) which at first sight suggests that there is still an effective texture cue present. However, our index of relative effectiveness compares depth/height ratios when there is a full texture cue with those when there is a texture cue specifying a flat plane. In the latter case, the texture cue may actually cause a reduction in perceived depth (relative to a notionally absent texture cue). Since a texture cue specifying a flat plane has a relative effectiveness of zero, a texture pattern that is ineffective in specifying a curved surface or a flat plane should produce a relative effectiveness greater than zero. Thus the mean relative effectiveness of 0.36 could conceivably correspond to a stimulus in which shape-from-texture reconstructs neither a flat plane nor a cylindrical surface-i.e. it may indicate the lack of any effective stimulus for shape-fromtexture. Further work will be necessary to verify this interpretation. The variations in texel area and texel density in these stimuli were all geometrically appropriate for our volumetric model of texture formation. The fact that some of these stimuli were ineffective for the perception of shape-from-texture lends further support to the conclusion drawn from Expt 1, that some components of the stimuli are not salient for human shapefrom-texture. We did not, therefore, explore stimuli in which random variation was added to the density and area gradients. The two main features of these data-the disruption caused by large random horizontal elongations, and the effectiveness of randomly oriented ellipsoids--are best explained by considering anisotropy of the surface texture. Random horizontal elongation produces an anisotropic surface texture, and, as this anisotropy increases, shape-from-texture breaks down. With randomly oriented ellipsoids, the surface texture remains isotropic, in spite of increasing elongation. Thus, anisotropy seems to reduce the perceptual salience of compression gradients.
TEXTURE CUES ON CURVED SURFACES
DISCUSSION
The experiments described here were designed to examine some of the ways in which human subjects use variations in image texture to extract information about the three-dimensional shape of surfaces. They were all performed in binocular viewing with disparity fields depicting similar surfaces. The results are broadly in agreement with earlier monocular experiments. The most significant texture cue for the perception of curved surfaces is variation in the compression of texture elements, while changes in their area or density have little effect. These three texture properties are not as changes in compression produce independent, changes in texel area. If the visual system attempted to measure compression gradients by some simple metric based on texel area, then changes in texel area would be an effective stimulus. The lack of such an effect in these data suggests that human observers do not use any such simple metric. This study used stimuli in which shape-from-texture resulted almost exclusively from foreshortening, with negligible perspective gradients. This lack of any effect of texel size and density alone was also reported by Todd and Akerstrom (1987). In contrast to our results, they found that removing the gradient in texel area resulted in reduced depth perception, relative to stimuli with all three gradients. They concluded that the area cue is effective in specifying curvature, but only in the presence of compression. As mentioned above, this discrepancy may reflect how shape-from-texture is modified by information from stereopsis. Since we found no effect of removing only the area gradient, our data can be described by a model which simply calculates texel compression, and it is unnecessary to invoke the more complex metric proposed by Todd and Akerstrom. It is important to note that in Expt 1, and in the work by Cutting and Millard (1984) and Todd and Akerstrom (1986), the different cues were pitted against one another. In a stimulus containing only changes in density, the uniform compression specifies a flat surface, conflicting with the surface specified by the density changes. Thus the finding that the compression gradient is su~~ient to account for the effect of all three gradients does not rule out completely any contribution from the other cues, it just means that their contribution is very small when a compression cue is available. The second experiment throws light on the assumptions made by the visual system about the statistical properties of texture markings on surfaces. The texture composed of randomly oriented ellipses produced a substantial texture effect. Therefore it cannot be that subjects simply assume that the anisotropy of local image features (e.g. an ellipse) arises from projection of an isotropic surface marking (e.g. a circle). If each of the ellipses in the image had been assumed to be the projection of a circle on the surface, the surface reconstructed from this image would have been quite wrong. This does not support Stevens’ (1984) claim that subjects use purely local measures of compression to extract
837
surface orientation, rather they obtain information from the global pattern of image texture (as was found by Todd & Akerstrom, 1987). This is not very surprising, as all of the methods of texture formation discussed in the Introduction are capable of generating anisotropic texture markings, so it is natural that we should have developed a strategy that does not make assumptions about the shape of ~~d~v~d~~isurface markings. However, if anisotropic texture elements are oriented such that the global pattern of surface texture is anisotropic (i.e. when all ellipsoids are horizontally aligned) the texture cue is less effective. This suggests that human shape-from-texture works on the assumption that surfaces are covered with approximately isotropic textures. This is somewhat surprising, as a number of natural textures are anisotropic (such as marble or wood). It is important to note that the anisotropic surface textures used here were parallel to the cylindrical shape, specifically in order to confound the calculation of surface orientation from image compression. It may be that these di~culties would not arise with anisotropic textures which are not aligned with the orientation of the surface, but are in some accidental orientation. In the experiments reported here, a number of surfaces were generated from inhomogeneous volumes, producing inhomogeneous surface textures. Many of these stimuli were nonetheless effective in stimulating shape-from-texture. In the stimuli whose image texture had no density or area gradients, the corresponding surface textures were inhomogeneous-the surface texture elements increased in area and were sparser as slant increased [see Fig. 2(A)]. The effectiveness of these stimuli suggest that the assumption that surface textures are homogeneous is not central to the human use of shape-from-texture. Some recent theoretical work throws light on why the visual system analyses the shape of texture elements, rather than their numerical distribution, for calculating Blake and Marinos (1990) used shape-from-texture. maximum likelihood estimators to calculate surface slant from image textures. Their data show that, given a finite number of samples, calculations based on variations in texel density (under perspective projection) usually provide less precise estimates of surface slant than those derived from variations in texel orientation. This work has recently been extended to curved surfaces (Blake et al., 1993), with the same result holding for surfaces of the type used in the experiments reported here. Perhaps the human visual system neglects variations in texture density when estimating curvature because they are a less reliable source of information, Changes in texture element compression provide a more reliable source of shape-from-texture information. Thus, surfaces with inhomogeneous distributions of texture elements do not affect the important component for texture analysissystematic variation in the compression of texels. Conversely, surfaces with anisotropic texture markings can disrupt this same cue which is essential to the analysis of shape-from-texture.
838
B. G. CUMMING
Although inhomogeneities in image texture are ineffective in specifying surface curvature, or local surface orientation, there is evidence that they are effective in specifying perspective. As our stimuli contained only very small perspective gradients, no effects of image texture inhomogeneity were observed. Cutting and Millard (1984) showed that gradients in texel size and density produced by perspective produce a strong impression of a receding surface (the property they call “flatness”). Buckley et al. (1991) studied slant judgements using planar surfaces (with substantial perspective gradients) specified by both stereopsis and texture. For their large, receding surfaces, they found that perspective cues did alter slant judgements, even in the presence of binocular disparities. Buckley et al. also reported that the effectiveness of the perspective cue depended upon the type of texture elements used-square elements were more effective than circles. This may be because single square texture elements project to trapezoidal image features, each of which defines local perspective quite well-the two sides of the square point towards the vanishing point. With circular elements, it is difficult to extract the perspective deformation of a single element, so perspective is more easily extracted from the pattern of texel spacing and size. The strategy used for the perception of shape-from-texture may well depend upon the context in which texture variations appear. The extraction of three-dimensional shape-fromtexture is only possible if assumptions are made about the surface texture. A given pattern of image texture is compatible with an infinite set of surfaces, each with a different surface texture. Most of these possible surfaces would have inhomogeneous or anisotropic surface textures, and hence could be discarded by analysis of texture alone. In natural vision, there are further restrictions on the set of possible surfaces, and we have investigated the processing of shape-from-texture when such restrictions are imposed by stereopsis. Both of the experiments reported here suggest that for curved surfaces defined by stereopsis and texture, humans make the assumption that surface textures are isotropic, but they do not assume that they are homogeneous. The fact that human observers seem to rely on only one of these assumptions may be a consequence of the statistical properties of the calculation, or it may reflect which assumptions can safely be applied to natural curved surface textures.
et al. REFERENCES
Bard, J. (1977). A unity underlying the different zebra striping patterns. Journal of Zoology, London, 183, 527-539. Blake, A. & Marinos, C. (1990). Shape from texture: estimation, isotropy and moments. ArtiJcial Intelligence, 45, 323-380. Blake, A., Billthoff, H. & Sheinberg, D. (1993). An ideal model for inference of shape from texture. In preparation. Braunstein, M. L. (1976). Depth perception through motion. New York: Academic Press. Buckley, D., Frisby, J. P. & Spivey, E. (1991). Stereo and texture cue integration in ground planes: An investigation using the table stereometer. Perception, 20, 91. Biilthoff, H. & Mallot, H. (1988). Integration of depth modules: Stereo and shading. Journal of the Optical Society of America, 5, 1749-1758.
Cornsweet, T. N. (1962). The staircase method in psychophysics. American Journal of Psychology, 75, 485-491.
Cutting, J. E. & Millard, R. T. (1984). Three gradients and the perception of flat and curved surfaces. Journal of Experimental Psychology: General, 113, 198-216.
Gibson, J. J. (1950). The perception of the visual world. Boston, Mass.: Houghton Mifflin. Johnston, E. B. (1991). Systematic distortions of shape from stereopsis. Vision Research, 31, 1351-1360.
Johnston, E. B., Cumming, B. G. &Parker, A. J. (1991). Stereo-texture interactions in 3-D shape perception. Investigative Ophthalmology and Visual Science, 31, 304.
Johnston, E. B., Cumming, B. G. & Parker, A. J. (1993). Integration of depth modules: Stereopsis and texture. Vision Research, 33, 8 13-826.
Maloney, L. T. & Koh, K. (1988). A method for calibrating the spatial coordinates of a visual display to high accuracy. Behavior 3722389.
Research
Metho&,
Instruments
and
Computers,
20,
Stevens, K. A. (1984). On gradients and texture gradients. Journul of Experimental Psychology: General, 113, 217-220.
Todd, J. T. & Akerstrom, R. A. (1987). Perception of threedimensional form from patterns of optical texture. Perception and Psychophysics, 13, 242-255.
Watson, A. B., Nielson, K. R., Poirson, A., Fitzhugh, A., Bilson, A., Nguyen, K. & Ahumada, A. J. (1986). Use of a raster framebuffer in vision research. Behaviour Research Methods, Instruments and Computers, 18, 587-594.
Witkin, A. P. (1981). Recovering surface shape and orientation from texture. Artificial Intelligence, 17, 17-47.
Acknowledgements-This research was funded by the Wellcome Trust, the SERC and the MRC. Additional support was provided by the McDonnell-Pew Centre for Cognitive Neuroscience. We are grateful to Andrew Blake for numerous helpful discussions, and to Mike Landy for helpful comments on earlier versions of the manuscript. We thank Julie Harris and Carol Cumming for acting as observers.