Stereoscopic Vision
411
Stereoscopic Vision A J Parker, University of Oxford, Oxford, UK ã 2009 Elsevier Ltd. All rights reserved.
Introduction Taken literally, stereoscopic vision describes the ability of the visual brain to register a sense of threedimensional shape and form from visual inputs. In current usage, stereoscopic vision often refers uniquely to the sense of depth derived from the two eyes. This usage excludes a number of things that might be considered stereoscopic vision, such as the sense of depth arising from the motion parallax generated when subjects translate themselves through the visual environment. This article is primarily concerned with binocular stereoscopic vision. Current usage also means that the term ‘stereoscopic vision’ tends to include a number of issues relevant to binocular vision that are unrelated to the perception of three dimensions. One example is singleness of vision: namely, the generation of perception of a single object by the two eyes by means of their co-coordinated use. Double vision is easily experienced for an individual object by raising a finger before the two eyes while continuing to look at a distant object, such as a church tower. Unusual circumstances, often a head injury or other neurological condition affecting the coordination of the two eyes, are required for all visual objects to look double.
Binocular Viewing Can Provide a Sense of Depth When humans view a solid, three-dimensional object, each eye receives a slightly different view (see Figure 1) of the object. For the solid sphere (t) in Figure 1, the left eye (m) sees rather more of the left-hand side of the sphere before the surface of the sphere is occluded by its rim. Conversely, the right eye (n) sees a little more of the right-hand side of the sphere. Ptolemy, Leonardo da Vinci, and many in between understood such matters clearly. However, the ability of binocular geometry to provide a sense of depth was fundamentally misunderstood until the striking demonstrations due to Charles Wheatstone (1802–75; see Figure 2). Wheatstone arranged for each eye to be given a different view in the most direct way, by showing each eye a physically different picture to look at. When the pictures viewed by the left and right eyes
approximate the views that the eyes would receive when viewing a real object in natural binocular viewing, the visual brain responds to those views as if a real object in depth were being displayed. Wheatstone’s discovery generated intense scientific debate throughout the rest of the nineteenth century. For example, the need for real, physical movements of the eyes to gain a sense of depth was hotly contested until the critical experiment of Dove, who demonstrated that a sense of stereoscopic depth is produced by even the briefest flashed exposure that leaves the eyes no time to move. Integration of binocular stereo depth with other sources of information about three dimensions was intensively researched; for example, Helmholtz appreciated that there are strong similarities between depth from binocular stereopsis and depth from motion parallax but, critically, that one of these depth cues can supply information that is absent from the other. The link between binocular stereopsis and binocular eye movements was investigated, including the important links with changes in the focusing power of the eyes’ lenses, known as accommodation. All this work was reviewed by Helmholtz and his acolytes toward the end of the nineteenth century. Wheatstone’s discovery also generated huge popular interest. Coupled with the development of photography, the stereoscope became a source of home entertainment. Many Victorian parlors housed one, and large collections of stereo pictures were generated. It is alleged that the stereoscope was put aside only when its persistent use to view salacious images became so widely known that its presence in a living room became an immediate source of embarrassment. (The same images are often regarded nowadays as signifying the innocence of a bygone age, and stereoscopic photography now has its place among nostalgia technology.) Such a combination of high scientific and popular interest could not fail to challenge egos, and the discovery of the stereoscope was disputed hotly. There is no doubt that David Brewster’s version that used lenses for viewing was much more convenient for popular viewing and thus became the canonical form, but Brewster’s claims to priority were always regarded as misplaced.
Geometry of Binocular Viewing The viewing geometry that Wheatstone had identified was subject to intensive analysis during the nineteenth and early twentieth centuries. This work
412 Stereoscopic Vision b
a
f
r
m
t
e
n A
50
B
Figure 1 Leonardo da Vinci (top left) understood that each eye obtains a slightly different view of the world. For objects close to the head, some features on the surface of a solid object are occluded from one eye’s view but are visible to the other eye. The diagram at the top right is derived from Leonardo’s own drawing, adapted by Wade NJ, Ono H, and Lillakas L (2001) Leonardo da Vinci’s struggles with representations of reality. Leonardo 34: 3, with permission from MIT Press Journals. The lower panel shows a figure from Descartes’s De Homine, in which the schematic of connecting the left and right eyes to a single binocular target in the central nervous system is proposed.
established the concept of ‘binocular disparity’ as the fundamental measurement employed by the visual system to gain a sense of binocular stereoscopic depth. Binocular disparity is the angular difference between two landmarks viewed binocularly (see Figure 3). Corresponding Points
This phase of study also introduced the concept of ‘corresponding visual points’ between the left and right eyes. For example, points that are 3 mm horizontal to the left of the fovea in the left and right eyes are corresponding points, whereas points that are 3 mm to the left in one eye and 2.9 mm to the left in the other eye are noncorresponding. If both eyes are looking at a single object in the world, then corresponding points on the retina project out to a set of geometrically defined locations in threedimensional space. For the case of a plane, defined by the line joining the two eyes and the binocular fixation point, the locations of corresponding points defined a circle, the ‘Vieth-Muller circle,’ passing through the binocular fixation point and the optical centers of the two eyes.
Much of this development was directed toward understanding the perception of three-dimensional space by human observers. Imagine a person is looking at an array of objects lying in different directions and at different distances with respect to the binocular fixation point. If the person closes first one eye and then the other, given that each eye is at a different position, do these objects appear to lie in the same direction? The geometry of the Vieth-Muller circle provides a simple hypothesis, which is that objects will appear to lie in the same visual direction when they lie on anatomically corresponding points. This hypothesis is close to correct, but there are distinct patterns of deviation from the Vieth-Muller circle, and formulations of other curves and surfaces have been developed to describe the locations of points that are in visual, as opposed to anatomical, correspondence. The concept of corresponding points is also important in considering the binocular coordination of the movements of the eyes. If the eyes are fixating an object at a particular distance from the observer’s head, then the eyes will need to rotate inward with respect to each other if the object is near the head. Inspection of a new object at a different viewing distance therefore requires an adjustment, inward or outward, of the binocular alignment, if the new object is to fall on the foveae of both eyes. The size and direction of the adjustment is given by the binocular disparity between the currently viewed object and the new one. Absolute and Relative Disparity
One way of defining an origin for measurement of disparity is therefore to use the current binocular fixation point. Anatomically corresponding points have a disparity of zero, and noncorresponding points may have a positive or negative disparity, depending on whether they project to points farther than or nearer than the distance to the fixation point. Since this definition can be employed even when the fixation point is just notional (a location in space with no visible object, toward which the foveae project), it is useful to give this concept of disparity the label ‘absolute disparity.’ This definition of disparity may be contrasted with the concept of ‘relative disparity,’ which is defined as the disparity between two visible features. Here these two features are separated by one visual angle as seen from the left eye and a different visual angle as seen from the right eye: relative disparity is therefore the difference in these two angles. Note that, as the eyes move, the visual angle between the two features is not altered by rotations of the eye. Thus the relative disparity between two visible features is independent
Stereoscopic Vision
D
D⬘
A⬘
E⬘
A
l e⬘
413
C⬘
E
r B
C
e
P
Figure 2 Charles Wheatstone and his mirror stereoscope. The drawing of the stereoscope shows drawings (E and E0 ) reflected by mirrors (A and A0 ) set at an angle so as to direct the reflected images toward a viewer. Wheatstone’s observations with this device led him to conclude that binocular disparity was a sufficient stimulus by itself to give rise to a sensation of stereoscopic depth.
of the choice of binocular fixation point, which means that relative disparity is a useful description of the stimulus that is purely visual, not reliant on eye position. In the psychological literature, the term ‘disparity’ is often taken to mean relative disparity, but in optometry, physiology, and ophthalmology, disparity is more likely to refer to absolute disparity.
First Steps to Neuronal Mechanisms A major step forward in the understanding of binocular stereoscopic vision came in the 1960s. Two advances were especially significant. First, Bela Julesz, working at Bell Laboratories, invented the random
dot stereogram (RDS; see Figure 4) and exploited this new paradigm to define and explore what he termed ‘cyclopean vision.’ Cyclopean refers to the mythical giant Cyclops, who had just one eye in the center of his forehead. Julesz’s RDS consisted of fields of randomly bright and dark pixels, which were correlated between the left and right eyes’ images. When placed in a stereoscope, a region of this figure was revealed as segregated from the background because it was at a different binocular depth (see Figure 4). Figures demonstrating stereoscopic segregation had been prepared before, but Julesz’s development of a computer-based method was truly original and profoundly influential. Julesz argued that the perception of the revealed figure reflects central
414 Stereoscopic Vision N
N
O
O
α
FL a
α
β
FL
FR
β
FR
b
Figure 3 Diagram showing the definition of absolute and relative disparity. (a) The eyes are looking at point O. The angles a and b between the line of fixation to O and the projection of point N are different as viewed with the left (L) and right (R) eyes. N projects to a point in the right eye that is farther from the fovea (F) than is the point from the fovea in the left eye. N therefore has a nonzero absolute disparity whereas O has a zero absolute disparity. (b) The eyes have shifted to a new fixation point. Now, both O and N have nonzero absolute disparities. However, the relationship between O and N has not changed: Angles a and b still describe the angular separation of O and N. In both (a) and (b), the relative disparity between N and O is given by the quantity (ba).
processing by the brain, separate from the influence of either eye alone. He conducted a series of studies that investigated the classical visual illusions presented in their novel cyclopean form. The other advance was the first identification of single neurons that are sensitive to the disparity of visual stimuli. In the late 1950s, Hubel and Wiesel had discovered that some neurons in the visual cortex often respond selectively to the orientation of visual contours and are sensitive to stimuli over a limited region of visual space (the receptive field) in both the left and right eyes. These neurons showed ‘binocular summation’: the response to the simultaneous stimulation of left and right eyes was stronger than the response to either eye alone. Two research groups, Nikara, Bishop, and Pettigrew, working in Australia, and Barlow, Blakemore, and Pettigrew, in the United States, presented new findings that some number of these visual neurons are specifically most sensitive when the independent stimulation of the left and right eyes falls on noncorresponding, as opposed to corresponding, points. Such neurons therefore give their greatest response to a nonzero disparity, and as the disparity is adjusted away from this optimal value, the response of the neuron declines. A neuron that responds optimally to stimulation of noncorresponding points is sensitive to the absolute disparity
Figure 4 A random dot stereogram (RDS). When the left and right eye’s images are viewed in a stereoscope, a region of different depth in the center emerges. The principle was apparently first published by Cajal, who devised it as a means of secret writing. The upper panel shows a stereoscopic camera taking the image of written text on a plate B at a different distance from the camouflage background A. If either the left or right photographic plate is viewed alone, the message cannot be decoded. Presumably, the intent was to send each image via a different route. The lower panel shows a modern RDS of the type invented by Julesz. When the left and right eye’s images are fused binocularly, a circle segregates in depth from the background square.
of the stimulus. This work established that there were neurons sensitive to different values within range of nonzero disparities, implying a population of neurons that is capable of encoding a range of binocular depths with respect to the binocular fixation point (see Figure 5). Another consequence was to give an unambiguous physiological interpretation to the concept of binocular corresponding points developed in earlier work. Computational Stereo
These two developments set the scene for most modern experimental studies of stereoscopic vision. Julesz also identified a different conceptual aim; he sought to specify an automatic, computational method for processing a stereoscopic pair of images to extract the binocular depth. Marr developed an exceptionally broad view of the computational problem of early visual processing, which took in image processing, visual geometry, retinal and cortical physiology, artificial intelligence, and much more. Nonetheless, it
Stereoscopic Vision 80
Cat 13 converged at 50 cm Distribution in depth for optimal stimulation of cortical neurones
9
7 6 8 70
60 4
c ller
ircle
50
13 3 14 19
24
22 25 17 16 10 18 20 23
40
11
30
Distance from the eyes (cm)
Mu h– t e Vi
21 12 20
10
0
415
During this period, the neural apparatus must receive appropriate visual stimulation; otherwise the neurons fail to connect appropriately to one another. Much of the investigation of this so-called critical period was based on the study of binocularity in the visual cortex. Interventions such as depriving one eye of vision result in that eye’s having a weaker, or even absent, influence on primary visual cortical (V1) neurons. The importance of concurrent input to both eyes was highlighted by the finding that alternating occlusion of each eye resulted in a visual cortex in which there were almost no binocular neurons but large numbers of neurons responsive exclusively to stimulation of either the left eye or the right eye. A similar pattern of cortical organization was found when a squint was induced by surgical intervention that resulted in the misalignment of the left and right eyes. Loss of binocularity is a central element of the human clinical syndrome of strabismus and associated amblyopia, in which the disruption of vision in childhood may lead in effect to blindness in one eye due to that eye’s functional disconnection from the visual cortex. Hubel and Wiesel emphasized the importance of visual stimulation in selectively retaining cortical connections that had already been set up by genetic instruction, while others, notably Blakemore, highlighted the possibility that environmental influences could induce the formation of new connections.
Current Progress in Understanding Binocular Stereopsis Figure 5 A range of neurons in cat visual cortex that have their greatest response at different binocular depths. The numbered dots represent visual stimuli. The dottled lines show the lines of right to the cat’s fixation point. The Vieth–Muller circle is the circle passing through the fixation point and the optic centers of the cat’s eyes. From Barlow HB, Blakemore C, and Pettigrew JD (1967) The neural mechanism of binocular depth discrimination. Journal of Physiology (London) 193(2): 327–342, with permission from Wiley-Blackwell Publishing Ltd.
is significant that Marr, working with T Poggio, took the problem of stereoscopic vision as a vehicle for carrying forward his more general program. A particularly exciting aim was that of mapping components of the computation onto the actions of individual nerve cells. Development of Stereo Vision
An important element of Hubel and Wiesel’s neurophysiological studies revealed how the visual system develops during early life. They argued that there is a period of high plasticity during early development.
The Binocular Energy Model
At the physiological level, an important modern development was to understand the way in which the initial stages of disparity detection proceed. Ohzawa and Freeman, later joined by DeAngelis, found that many disparity-sensitive neurons in the V1 behave as if the initial stage of binocular combination is a simple addition of the excitation arising from each eye. The initial work on this problem used sinusoidal grating stimuli, in which the relative phases of the sinusoids in the left and right eye were adjusted parametrically to measure the binocular combination rule used by the neurons. A number of observations led them to propose that binocular disparity might be detected by means of either a mechanism that is specifically sensitive to interocular phase or (as originally implied) a mechanism that encodes a positional shift between the two eyes’ receptive fields. Overall, their experimental findings pointed to a two-stage model of disparity detection. Disparity detection arises by thresholding and squaring the output of an initial detector and then combining
416 Stereoscopic Vision
signals from multiple initial detectors, either phase- or position-sensitive; the combination of signals encompasses detectors with the same disparity sensitivity over a small range of locations in the visual field. This model was termed the ‘energy model’ since the operation of squaring that is embedded within it parallels the definition of – and method for calculating – the energy of an electrical signal. This model has been very successful in capturing a wide variety of findings about the functional properties of disparity-selective neurons in V1. Although a number of adjustments and elaborations to the model have been proposed, the substantial elements of the structure have remained. Other Species
All these studies focused on the properties of neurons in the V1, which is the earliest stage at which disparitysensitive neurons are found. During the 1970s and 1980s, the neural basis of stereopsis was extended considerably. First, other species were examined; for example, the owl’s visual wulst also contains neurons selective for binocular disparity. Notwithstanding the fact that the anatomical projections from retina to brain are organized differently in owls and mammals, the forward-facing eyes of the owl are associated with the capacity to detect stereoscopic depth, and there are strong similarities in the neural apparatus involved in all species. Second, many different visual cortical areas were explored, mostly in the macaque monkey. The multiplicity of visual areas, initially identified by Zeki, proved to have a variety of specializations for processing different aspects of the visual image, such as color and motion. For stereoscopic vision, the work of G Poggio established unambiguously the presence of disparity selectivity in V1, V2, and V3, while the work of Hubel and Livingstone suggested that binocular neurons (by implication responsible for the perception of stereo depth) were especially concentrated in compartments of V1 and V2, identifiable on the basis of histological staining for the mitochondrial enzyme cytochrome oxidase; these were the regions between cytochrome oxidase blobs in V1 and the thick stripes in V2. From Disparity Detection to Depth Perception
Although neurons both inside and outside the V1 were explored in detail, most experiments tended to carry over the paradigms established for V1 into other cortical areas. This ultimately proved to be a limitation. The reason was that disparity selectivity at the physiological level was dominated by the conceptual framework set up by the earliest investigations, which grounded their framework for understanding the
concept of retinal correspondence described earlier. Work in human psychophysics, particularly due to Westheimer, pointed out that relative disparity was a much more relevant parameter for describing the outcome of perceptual experiments. This highlighted a gap in the conceptual framework for understanding binocular stereoscopic vision at the neuronal level. Neuronal studies were working with a framework linked to retinal correspondence and hence to absolute disparity (defined earlier), whereas perceptual studies pointed to relative disparity. A number of other perceptual properties of stereoscopic vision are at variance with the simple detection mechanisms implied by studies in V1, even those motivated by computational principles implemented in the energy model. Cumming and Parker therefore initiated a program to compare the responses of single neurons at various levels in the visual system of nonhuman primates against the perceptual response to binocular stereo depth. The focus was to identify the extent to which neuronal responses in particular brain regions could explain the perceptual characteristics of binocular stereoscopic vision. This work demonstrated, for example, that neurons in V1 did not signal relative disparity whereas at least some neurons in cortical area V2 did indeed respond to relative disparity. Evidence from a number of laboratories has continued to develop the picture that stereoscopic depth processing is elaborated within the extrastriate visual areas beyond the early disparity-detection mechanisms in V1. This concept is also consistent with the conclusions from more recent developmental studies, in which visual processing has been disrupted experimentally or owing to human clinical conditions. These studies have indicated that some of the deficits in visual performance arise from disruptions of neuronal processing in extrastriate visual areas, which bring additional losses in performance over and above those attributable to the effects observed in V1. Cortical area V5/MT, mostly studied in relation to the processing of visual motion, has proved to have an important role in binocular stereoscopic vision. In particular, DeAngelis, working initially in Newsome’s laboratory, has revealed a number of neuronal signals identifiable in V5/MT that are specific to the performance of certain stereo tasks. These signals are separate from signals that are generated simply by the presence of certain binocular disparities within the stimulus. Thus, V5/MT carries not just simple sensory signals but also task-related signals for certain perceptual judgments of stereo depth. V5/MT lies in the dorsal stream of extrastriate visual pathways. Meanwhile, in the ventral stream, Janssen, working in Orban’s group, has shown that brain area Tes, in the inferotemporal cortex,
Stereoscopic Vision
carries signals about the three-dimensional shape of surfaces, specified by binocular disparity. Current thinking suggests a broad division of functional involvement of dorsal and ventral extrastriate areas in different aspects of binocular stereoscopic vision. The ventral stream appears to be mainly concerned with exploiting stereoscopic depth for shape perception. The dorsal stream carries signals for controlling binocular eye movements, for orienting the observer in the visual environment, and for segregating different depth planes so that visuomotor tracking behavior can be sustained. The dorsal stream appears to make use of a simple correlation-based stereo algorithm, broadly similar to the energy model in V1, while the ventral stream may use a more sophisticated stereo algorithm that sustains detailed point-to-point matching of binocular features between the eyes. See also: Representation of Reward; Shape Representation in Inferotemporal Cortex; Vision: Surface Segmentation; Vision: Light and Dark Adaptation.
Further Reading Barlow HB, Blakemore C, and Pettigrew JD (1967) The neural mechanism of binocular depth discrimination. Journal of Physiology (London) 193(2): 327–342. Bishop PO and Henry GH (1971) Spatial vision. Annual Reviews of Psychology 22: 119–160.
417
Bradley DC, Qian N, and Andersen RA (1995) Integration of motion and stereopsis in middle temporal cortical area of macaques. Nature 373(6515): 609–611. Cumming BG and DeAngelis GC (2001) The physiology of stereopsis. Annual Review of Neuroscience 24: 203–238. DeAngelis GC, Cumming BG, and Newsome WT (1998) Cortical area MT and the perception of stereoscopic depth. Nature 394 (6694): 677–680. Helmholtz HV (1962) Handbook of Physiological Optics. New York: Dover (originally published 1909). Janssen P, Vogels R, and Orban GA (2000) Three-dimensional shape coding in inferior temporal cortex. Neuron 27: 385–397. Julesz B (1971) Foundations of Cyclopean Perception. Chicago: University of Chicago Press. Marr D and Poggio T (1979) A computational theory of human stereo vision. Proceedings of the Royal Society of London, Series B: Biological Sciences 204(1156): 301–328. Ohzawa I, DeAngelis GC, and Freeman RD (1990) Stereoscopic depth discrimination in the visual-cortex: Neurons ideally suited as disparity detectors. Science 249(4972): 1037–1041. Orban GA, Janssen P, and Vogels R (2006) Extracting 3D structure from disparity. Trends in Neurosciences 29(8): 466–473. Parker AJ (2004) From binocular disparity to the perception of stereoscopic depth. In: Chalupa L and Werner JS (eds.) The Visual Neurosciences, vol. 1, pp. 779–792. Cambridge, MA: MIT Press. Parker AJ (2007) Binocular vision and the cerebral cortex. Nature Reviews Neuroscience 8: 379–391. Wade NJ, Ono H, and Liuakas (2001) Leonardo da Vinci’s struggles with representations of reality. Leonardo 34(3): 231–235. Wheatstone C (1838) Contributions to the physiology of vision: I. On some remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical Transactions of the Royal Society of London 128: 371–394.