Acknowledgement
79-91 (1989) J. Neurosci. 9, 2197-2209 38 Stader, C. B. et al. (1983) Proc. Natl Acad. Sci. USA 80, 51 Korn, H. and Faber, D. S. (1987)in NewlnsightsintoSynaptic Function (Edelman, G. M., Gall, W. E. and Cowan, W. M., 1840-1844 eds), pp. 57-108, Neurosciences Research Foundation, Wiley 39 Dana, E. etal. (1989)J. Neurosci. 9, 2247-2257 & Sons 40 Hamel, E. and Beaudet, A. (1984) Nature 312, 155-157 41 Triller, A., Seintanidou, T., Franksson, O. and Korn, H. (1990) 52 Blankenship, J. E. and Kuno, M. (1968) J. Neurophysiol. 31, 195-209 The New Biologist 2,637-641 53 MacLachlan, E. M. (1978) Int. Rev. Physiol. Neurophysiol. 17, 42 Malinow, R. and Tsien, R. W. (1990) Nature 346, 859-861 49-117 43 Korn, H. and Faber, D. S. (1990) in Glycine Neurotransmission (Ottersen, O. P. and Storm-Mathisen, J., eds), pp. 54 Clements, J. D. (1990) J, Neurosci. Methods 31, 75-88 55 Korn, H., Fassnacht, C. and Faber, D. S. (1991) Nature 350, 139-170, Wiley & Sons 282 44 Bekkers, J. M. and Stevens, C. F. (1990) Nature 346, 56 Faber, D. S. and Korn, H. Biophys. J. (in press) 724-729 45 Neale, E. A., Nelson, P. G., MacDonald, R. L., Christian, C. N. 57 Korn, H., Faber, D. S., Burnod, Y. and Triller, A. (1984) J. Neurosci. 4, 125-130 and Bowers, C. M. (1983)J. Neurophysiol. 49, 1459-1468 46 Redman, S. and Walmsley, B. (1983) J. Physiol. 343, 58 Mintz, I., Gotow, T., Triller, A. and Korn, H. (1989) Science 245, 190-192 135-145 59 Dale, N. and Kandel, E. R. (1990) J. Physiol. 421,203-222 47 Wojtowicz, J. M. and Atwood, H. L. (1986) J. Neurophysiol. 60 Martin, A. R. (1977) in Handbook of Physiology: The 55, 484-498 Nervous System, (Sect. I, Vol. I, Part I), pp. 329-355, 48 Wojtowicz, J. M., Smith, B. R. and Atwood, H. L. Ann. N.Y. American Physiological Society Acad. Sci. (in press) 49 Sayer, R. J., Redman, S. J. and Andersen, P. (1989) 61 MacNaughton, B. L., Barnes, C. A. and Andersen, P. (1981) J. Neurophysiol. 46, 952-966 J. Neurosci. 9, 840-850 50 Foster, T. C. and MacNaughton, B. L. (1991) Hippocampus 1, 62 Malinow, R. (1991)Science 252, 722-724
. . . .
.......
......................................................
r
e
We thank R. Miles for critical reading of the manuscriptand help with Box2.
v
i
e
w
s
Neuralmodelsof stereoscopicvision Randolph
B l a k e a n d H u g h R. W i l s o n
Human stereopsis remains an enigma: how does the brain match features between the left and right eye images and compute disparity between these matched features? Developments in computational neuroscience and machine vision have led to several models of human stereopsis that provide insight into possible mechanisms underlying this phenomenon. These models, reviewed in this paper, adopt one of three general strategies. One class of models employs cooperative interactions, whereby a unique solution to the matching problem emerges from excitatory and inhibitory interactions among binocular neural elements. A second class of models implements matching and disparity computation serially over multiple spatial scales. A third class relies on local, non-interacting computations performed in parallel to overcome speed limitations inherent in the other models. Considered together, these theoretical developments offer fresh insights concerning the actual neural concomitants of binocular stereopsis. Looking about the visual world, we see objects from two slightly different perspectives owing to the lateral separation of the eyes in the head. Evidently the brain is remarkably adept at combining these two monocular views, for binocular single vision and stereopsis occur effortlessly even when the two monocular views contain no discernible form information to guide the matching of the left and right eye images 1. Moreover, the brain is incredibly accurate in registering slight positional differences of image features from the left and right eyes, as evidenced by our keen sense of stereoscopic depth perception. How does the binocular visual system identify corresponding monocular features, measure positional disparities between those features and transform those measurements into a description of the three-dimensional layout of objects in the visual scene? TINS, Vol. 14, No. 10, 1991
A first glimpse of the actual neural hardware underlying binocular vision was provided in the late 1960s when physiologists began recording from individual neurons in the visual cortex of mammals with frontally placed eyes. Shortly after Hubel and Wiesel2 described the existence of binocularly innervated cortical cells, several laboratories 3'4 reported that many of these binocular cells in fact possessed receptive fields on non-corresponding areas of the two eyes. In other words, the optimum stimulus for activating these neurons would be an edge or surface marking located somewhere in depth other than the plane of fixation. By virtue of their disparity selectivity, these binocular neurons provided a plausible neural basis for stereopsis. This idea received further support from behavioral experiments demonstrating deficits in stereopsis in animals that lacked the normal complement of binocular cortical neurons 5'6 Following these initial exciting discoveries, other laboratories concentrated on measuring the degree of disparity selectivity among single binocular neurons 7, correlating those measures with cortical cell types 8 and searching for disparity-selective cells in extrastriate areas 9. Much of that work is reviewed elsewhere 1°. Despite this enduring interest in disparity processing by the brain, however, real progress in our understanding of the neural bases of stereopsis has come slowly, partly because of the absence of a theoretical context in which to frame physiological questions. Recent developments in computational neuroscience and machine vision have provided new theoretical insights into the nature of stereoscopic vision, in turn leading to important psychophysical discoveries concerning human stereopsis. This article reviews those theoretical and empirical developments, in the hope of stimulating a refined assault on the actual neural concomitants of stereopsis.
© 1991, EPsevierScience Publishers Lid, (UK) 0166- 2236/91/$0200
Randolph Blakeis at the Dept of Psychology, Vanderbilt University, Nashville, TN.77240, USA, and Hugh R. Wilson is at the Dept of Ophthalmology and VisualScience, Universityof Chicago, 939 E. 57th Street, Chicago, IL 60637, USA.
445
Fig. 1. A random-dot stereogram depicting a square in depth relative to the background. Viewing the stereo pair by crossing the eyes places the square in crossed disparity; it should stand out in front of the background. Uncrossing the eyes (or viewing through a stereoscope) places the square in uncrossed disparity," it will appear behind the surround. Note, incidentally, how the perceived depth of the square relative to background increases with viewing distance, even though angular disparity is decreasing. Neural models of stereopsis can be grouped conveniently into three major categories: cooperative models, models incorporating interactions across multiple spatial scales ('coarse-to-fine constraints' models), and feedforward models. Since many of these models have taken as their chief challenge a solution to the so-called 'false-target problem', we begin by defining that problem. The f a l s e - t a r g e t problem As a first step in stereoscopic vision, the brain must determine which features contained in the left eye's image match up with features in the right eye's image. This determination poses a serious challenge when a complex scene is viewed binocularly. Consider, for instance, the formidable task posed by a random-dot stereogram (RDS) composed entirely of randomly placed black and white dots (Fig. 1): for any given horizontal line of the RDS, an individual black dot in the view of one eye could in principle be matched to any black dot in the view of the other eye. With 50 black dots per line, for example, the number of unique binocular pairings is 50! (i.e. 5 0 x 4 9 x 4 8 . . . ) , and all but 50 of this immense number are spurious! False targets arise from the inherent geometric ambiguity in binocular vision: a pair of two-dimensional images of a three-dimensional world contains insufficient information for the unique reconstruction of that world. How, then, can the false-target problem be surmounted? To eliminate false targets and accurately compute depth, additional constraints must be embodied within the neural substrate performing that computation. As summarized in the following sections, three general strategies have been proposed.
Cooperative models Cooperative models solve the false-target problem and extend the disparity range processed using variants of two complementary themes: facilitation in the two-dimensional (i.e. fronto-parallel) plane and inhibition in the three-dimensional (i.e. depth) plane. Various cooperative models implement these interactions differently, and Fig. 2 summarizes schematically those various implementations. In all models, projections from the two retinas innervate an array of 446
binocular cortical neurons sensitive to a range of crossed and uncrossed disparities. Without other connections, these binocular neurons would respond to correct binocular matches as well as to false targets. How do the models veto incorrect matches? According to Dev 11 and to Nelson ~2, each binocular cell inhibits all others tuned to different disparities at the same position in space (blue vertical line in lefthand portion of Fig. 2A), and excites binocular cells tuned to the same disparity at neighboring spatial locations (black horizontal line). Inhibition across disparities prevents the perception of transparent depth surfaces, while facilitation across space minimizes the disparity variation from point to point. Consequently, these two models will extract solid surfaces varying smoothly in depth. Although the Nelson model is purely qualitative, simulations show that the Dev model is capable of solving onedimensional analogs of RDSs. In a variant on this cooperative theme, Marr and Poggio ~3 proposed that the inhibition across disparities should run diagonally (blue diagonal lines in middle portion of Fig. 2A). This arrangement implements the constraint that a single monocular input can only generate one binocular response. In their model, incidentally, binocular units behave as logical AND gates, responding only to simultaneous stimulation of both eyes. This property further simplifies the false-target problem, although it renders the units in the model non-functional when one eye is closed. Simulations 14 show that this model can solve RDSs. Sperling 15 produced a qualitative neural model that differs from those described above in several interesting respects. While assuming the existence of inhibitory connections across disparities at each point in space, his model includes no direct excitatory connections among binocular units (which, incidentally, are AND gates). Instead, 'implicit positive feedback' occurs via disinhibition. For disinhibition to promote the spread of facilitation across space, however, the disparity-specific inhibition must spread spatially, as indicated by the blue regions in the right-hand portion of Fig. 2A. With this inhibitory spread, a binocular unit at one location will disinhibit not only itself (via inhibition of its potential inhibitors) but also its spatial neighbors tuned to the same disparity. Sperling's model also postulates two differently sized groups of monocular receptive fields, with binocular processing occurring in parallel in both groups of receptive fields, large and small. Rather than interacting, however, these two merely pool their depth estimates in an unspecified manner. Mayhew and Frisby ~6 have developed a model that uses two different forms of cooperativity. An initial filtering stage extracts the positions and local orientations of edges in the monocular images. This filtering occurs in parallel on three spatial scales associated with three different ranges of spatial frequency tuning. Next, disparities are calculated between all horizontally disparate edge segments having similar orientations and matching polarities (light-dark or dark-light) in each monocular view. False targets are eliminated from this disparity field by application of two cooperative rules. One is a feature-specific continuity rule: cooperative interactions occur between oriented binocular edges that are roughly aligned along the direction of their orientation and at TINS, VoL 14, No. 10, 1991
the same disparity. This orientation-specific binocular A 4 Horizontal Position = facilitation is unique to this particular model. The 000,00000.0000000 other rule specifies cooperativity across spatial scales: binocular matches occurring at approximately 000"0,000,0"0000060 the same orientation and position across all three Far o o o o "o, o ,o o o o o o o 0 o scales are favored. This facilitation across scales .~" works because an object contour will usually generate O-.
T
/
\
/
/
O"
/
/
/
/
/
/
Coarse-to-fine strategies
A key observation concerning the false-target problem, originally highlighted by Marr and Poggio 2z, links disparity processing to receptive-field size (Fig. 3). Extensive physiological and psychophysical evidence implicates spatial filtering by cortical receptive fields that are responsive to a limited range of spatial frequencies, with the peak frequency (i.e. the value yielding maximum response) varying among cells 23'24. Now, suppose a pattern of spatial frequency ~Op is presented to the two eyes. Left and right eye images of that pattern can be unambiguously matched so long as the disparity between the two images does not exceed 1/(2O)p), i.e. 1/2 the period of the grating. However, cortical spatial filters respond to spatial frequencies up to about twice their peak spatial frequency ~op. Therefore, to ensure that all matches are correct for any given cortical cell, the matching process must be restricted to disparities no greater than +1/(4¢op); i.e. +1/2 the period of the cell's highest resolvable spatial frequency. (Because the TINS, Vol. 14, No. 10, 1991
drawings represent monocular neurons with horizontally displaced receptive fields. Circles within the 'disparity' matrix depict an array of binocular units with receptive fields centered at different horizontal locations, each tuned to a different crossed (near) or uncrossed (far) disparity. The thin dashed lines show the diagonal rows of binocular units (representing stimufi at different depths along the same sight line) innervated by an image falling two cells to the left of the fovea of the left eye (fL) and two cells to the left of the fovea of the right eye (fR). AS each binocular unit in these models responds only when both monocular inputs are simultaneously active (AND units), only the crosshatched unit in the matrix would respond to stimulation of these two monocular cells. (A) In the Dev 11and Nelson 12models (far left) each unit has reciprocal excitatory connections with its nearest spatial neighbors, which are tuned to the same disparity. Units at the same horizontal position tuned to different disparities are aft mutually inhibitory, as iflustrated by the vertical blue line. The Marr and Poggio 13 model (middle of diagram) has excitatory connections identical to the Dev and Nelson models, but the inhibitory connections run diagonally through the network (blue fines). The 5perling 15 model (far right) employs only inhibitory connections, but these must be assumed to have some spatial spread (blue regions) in order to permit a spatial spread of facilitation via disinhibition. (B) The model of Pollard et al. 18extends the region of facilitation to include a range of units falling within a disparity gradient limit of unity (dark gray regions). Inhibitory connections then extend into the cones of cells (blue) representing steep disparity gradients beyond unity. 447
spatial frequency) processed small disparities, while larger receptive fields (lower spatial frequency) processed progressively larger le disparities. According to the model, low spatial frequencies in the image (registered by large receptive fields) controlled vergence eye movements, which, in turn, brought fine spatial detail re re (high spatial frequencies) within the quarter-cycle disparity limit of the smaller receptive fields. Disparities were then summed across g spatial scales to produce a net disparity estimate. Computer simule lation of this model 25 accurately Spatial fn~luency / extracted depth from both RDSs and natural images, although some cooperative processing was found to be necessary in the implementation. re re Despite its successful simulations, however, this model has three serious flaws. The first one is logical: any disparities greater than _+l/(&Op) will be missed d = 114o,, (where top refers to the peak frequency of the most coarse-scale cortical filters). Second, stereo Fig. 3. Diagram showing how false-target matches can be avoided by restricting disparity processing to values less than the quarter-cycle limit. The four vertical ellipses represent the targets composed entirely of high receptive field for a cortical neuron whose spatial-frequency tumng curve is shown schematically in spatial-frequency information can the middle. The peak frequency for this neuron is co, and it responds reliably to spatial frequencies trigger vergence eye movements, up to twice the peak (2co). The left-hand pair of plots depicts the situation where a pair of cosine contrary to the model's predicgratings of spatial frequency co are presented separately to the two eyes [left (le) and right (re)]; the tion 26. Third, stereo targets comright-hand pair of drawings depicts the presentation of spatial frequency 2co to the two eyes. For posed exclusively of high spatial the pair of drawings in the top of the figure, the patterns are presented to left and right eyes with frequencies can be combined binozero disparity - the peaks of the patterns coincide. For the drawings in the bottom half of the figure, the patterns a r e presented with a disparity equal to 1/2 the period (reciprocal of spatial cularly at disparities well beyond frequency) of the grating - shifts larger than this in the position of one grating (i.e. greater the quarter-cycle limit 2r'zs (Fig. disparities) create false matches, as non-corresponding peaks in the left and right eye images now 4); this should be impossible acmove into closer register. False matches a r e avoided only within this limited disparity range - the cording to the model. Hence, the upper value being 1/2 the period of the grating. Because the neuron responds to spatial frequencies Marr and Poggio assumption of a up to twice the peak value, the narrowest range of spatial frequencies within which false matches quarter-cycle limit is clearly vioa r e avoided equals 1/2 the period of the highest resolvable spatial frequency: 1/4(o. This upper lated at high spatial frequencies, disparity value could be either crossed or uncrossed (i.e. the shift in the peak could be either to the and the assumption of vergence left or to the right), so the disparity range over which matches may be sought must be restricted to control by coarse spatial filters is -+1/4(o. In this example, peaks in the grating patterns were treated as matching features, but the same reasoning applies for other matching features (e.g. zero crossings), Neurons preferring lower also contradicted. Other models involving coarsespatial frequencies (i.e. neurons with larger receptive fields) might process larger disparities without false matches, whereas neurons preferring higher spatial frequencies (i.e. smaller receptive fields) to-fine strategies minimize the role of eye movements and instead a r e restricted to smaller disparities. Coarse-to-fine models exploit this relationship between implement spatial-scale interreceptive field size and disparity. actions neurally. Models by Nishihara 2~~and Quam 3° incorporate range must encompass both crossed and uncrossed this interaction in the manner illustrated in Fig. 5. For disparities, it is expressed as plus or minus the value any given region of the visual field, units with large within which unambiguous matches are possible.) receptive fields (coarse spatial scale) are selective for Restriction of disparity processing to this range will one of three values: crossed disparity for coarseyield only correct matches: the false-target problem image (i.e. low spatial-frequency) features that are would be circumvented. We shall term this solution closer than the fixation plane, zero disparity for coarse the 'quarter-cycle limit'. The following paragraphs features on or near the fixation plane, or uncrossed describe several ways this constraint could be imple- disparity for coarse features behind the fixation plane. mented, starting with Marr and Poggio's 22 model. By virtue of their large receptive fields, these coarseTheir model incorporated multiple spatial filters scale units process large disparities, but only within differing in size (or, in other words, peak spatial the quarter-cycle limit, which avoids the false-target frequency). For a given sized spatial filter, disparity problem. For each given visual field location, the most processing was restricted to a range defined by the active coarse unit among these three (crossed, zero quarter-cycle limit. Thus, small receptive fields (high or uncrossed) then constrains processing at the next 2 x Peak Spatial Frequency
Peak Spatial Frequency
\
/
i
448
TINS, VoL 14, No. 10, 1991
30
vE
10
g U..
3
,
,
,
i
,
I
1
J
3
J
,
,
•
.
•
'
10
Spatial frequency (cycles per degree) Fig. 4. The ordinate plots fusion limit (maximum horizon-
tal disparity yielding binocular single vision) as a function of peak spatial frequency. Test stimufi were vertical bars whose luminance profile was defined either by a difference of two Gaussians of opposite signs (DOG) or by the sixth derivative of a Gaussian (D6). Both types of bars ha ve limited spa tial frequency bandwidth: the bandwidth at half ampfitude is 1.7octaves for a DOG and 1.0 octaves for a D6. Sofid and open squares are data obtained with DOGs by Schor et al. 27 (IW, I. Wood; CS, C. M. Schor), while circles are averaged data for the three subjects in the study by Wilson et al. 28 (WB & H, Wilson, H. R., Blake, R. and Halpem, D. L.), which used D6 bars. The single diagonal line plots the quarter-cycle limit, Le. the upper value of a range of disparities within which the probability of false matches is essentially zero. Measured fusion limits fall well above this limit at spatial frequencies greater than about 2.0 cycles per degree. In this high spatial-frequency range, the visual system can fuse disparities many times larger than the quarter-cycle limit; the false-target problem arises and must be solved by neural interactions.
finest spatial scale to a 3×3 array of units whose smaller receptive fields analyse the same region of three-dimensional visual space as the most active coarse unit (shown in gray in Fig. 5). Thus, for instance, if the 'crossed' disparity (i.e. near depth) coarse unit is most active, the 3 × 3 array of finer units will be those preferring disparities clustered around this crossed disparity. These finer units have receptive fields of half the width of the coarse unit (which, to cover the same region of three-dimensional space, requires doubling their sampling density in space and disparity). This activated subset of finer scale units will now process disparities within the quarter-cycle limit for that spatial scale, again avoiding the falsetarget problem. This processing scheme is repeated on successively finer spatial scales, each having units packed twice as densely in both space and disparity. Simulations show that this class of coarse-to-fine constraint models can solve RDSs and natural images. This coarse-to-fine constraint could be achieved in any of several ways. For instance, the most active coarse unit could facilitate activity in its associated array of fine-scale units. Alternatively, the most active coarse unit could exert spatially localized inhibition of all disparity-selective units tuned to the finer scale, except those centered on the region of three-dimensional space represented by that most TINS, Vo/. 14, No. 10, 1991
active coarse unit (boxed subsets in Fig. 5). Less plausibly, monocular inputs to successively finer scales could literally be shifted, dependent on coarsescale processing - a kind of neural 'shifter circuit '25'31. In fact, evidence exists for coarse-to-fine stereo constraints in human vision. As shown in Fig. 4, the fusion range for a band-limited bar asymptotes at about 15 rain at spatial frequencies equal to or greater than two cycles per degree. These values were obtained for a bar presented against a uniform background. However, when a high spatial-frequency bar appears superimposed on a textured background, the spatial frequency of which is two octaves lower than that of the bar, the fusion limit for the bar drops to about 3.5 rain ~s. Moreover, this constricted disparity range for fusion remains centered on the disparity of the background texture: when the texture is imaged with crossed disparity, the 3.5 rain disparity range of the bar is centered around that crossed disparity; but when the texture is imaged with uncrossed disparity, the bar's disparity range is shifted to that uncrossed value. The disparity of the coarse-scale image, in other words, constrains the range of disparities within which the fine-scale image can be fused binocularly. Now when the same bar is superimposed on a high spatial-frequency background texture, the bar's fusion limit is unaffected by the background - the fine-scale texture does not constrain binocular processing of the lower-scale image. Thus, coarse-scale disparities do reduce the fusion range for finer-scale disparities to approximately the quartercycle limit, consistent with the Nisnihara and Quam models. These models are not without their shortcomings, though. As currently formulated, the coarse-to-fine strategy depends on the presence of low-frequency information for processing of large disparities, yet we know that large disparities can be processed at the higher spatial scales alone (Fig. 4). Feedforward models
The previous models all require either cooperative feedback or sequential disparity processing from coarse-to-fine spatial scale. Yeshurun and Schwartz 32 have recently published a parallel, feedforward algorithm for disparity analysis, which has the advantage of speed. Their algorithm, while not touted as a complete model of human stereopsis, warrants mention for it was inspired by the existence of ocular dominance columns in the visual cortex. Based on a signal-processing strategy termed 'cepstral filtering' used to detect radar echoes ('cepstral' being derived from 'spectral'), the algorithm implements an operation resembling autocorrelation. In the case of stereopsis, cepstral filtering is performed on the left and right eye neural images represented in a pair of neighboring ocular dominance columns. Any given pair of columns registers information over a small, local region of an image, and the filtering process is implemented on each pair with the entire ensemble processed in parallel. The filtering procedure itself involves three sequential steps: (1) computation of the power spectrum of the pair of matched images (a computation that could be approximated by spatial-frequency filtering); (2) a logarithmic transformation of this power spectrum (which could be accomplished with a compressive non-linearity); and 449
00000 00000
I.L
00 00
O
!
!
o
!
!
!
z
C3 LL
0
z
0 0 0 0 0
oO000 oO000 *0000
0 0 0 0 0 0 Horizontal position
A
~
~
0 0 0 0 0
000000 0 ! ! O 00 0 ! O ! 00 0 ! O ! 00
o 01'oo o o
000 000 000 000 000
0 [0 0
0 0
0 0 0 0 0
0 0 0 0 0 0
Horizontal position
B
~
~
Horizontal position
C
Fig. 5. Neural connections that would implement coarse-to-fine constraints. In each of the three panels, the ellipses represent binocular cells with receptive fields (not drawn to scale) at different horizontal locations and responsive to stimufi located at different depths (i.e. responsive to disparities signalling near, zero or far depth). Large ellipses depict cells with large receptive fields (coarse spatial scale) and smaller ellipses depict cells with smaller receptive fields (finer spatial scales). Because the fine-scale units sample from smaller areas than the coarse-scale units, they must be more numerous (i.e. sampling density must be greater at the fine scale) to cover the same region of space sampled by the coarse-scale units. Each activated coarse-scale cell facilitates activity within that specific, small array of fine-scale units with receptive fields centered at the same horizontal location and signalling the same depth as the coarse ceil (shown by an arrow in the three diagrams). The three panels depict three examples of coarse-scale facilitation of fine-scale units. (A) When activated, the left-hand coarse-scale unit selective for near depth (shown in gray) facilitates those fine-scale units with receptive fields centered around this near depth (i.e. around this crossed disparity) at this horizontal location (also shown in gray). (B) The middle coarse-scale unit selective for zero disparity facilitates that array of fine-scale units with receptive fields centered around zero disparity at this horizontal location. (C) The right-hand coarse-scale unit selective for uncrossed disparity facilitates the fine-scale units with receptive fields centered about this location in threedimensional space. These diagrams depict only two spatial scales, but in the models 29's° the facilitation from one scale to the next highest occurs over all scales of spatial filtering. Imagine, incidentally, superimposing the three drawings so that the cells are all in register. The result would represent the patterns of activation produced by viewing a surface rotated in depth about its vertical axis, with the left-hand edge closer to the observer than the right-hand edge. Reversing the direction of slant (i.e. so that the left edge is farther away) would reverse the pattern of activation across the array.
Finally, Ohzawa et al. z:3 have developed a feed(3) computation of the power spectrum of this logtransformed spectrum. The disparity between the forward model of disparity extraction that employs monocular images will now be represented by the a hierarchical arrangement of neural elements. The location of a peak in the amplitude spectrum from the actual disparity-detecting elements are complex cells, third step (which could be accomplished by disparity- each of which receives input from four binocularly selective binocular neurons3'4). innervated simple cells, all with the same preferred As a model of human stereopsis, there are two contour orientation. For a given simple cell, the left problems with this algorithm. First, the bandpass and right eye receptive fields have matching spatial receptive fields in the visual cortex are too broadly profiles, meaning the cell has the same preferred tuned to provide a reasonable approximation to the stimulus (e.g. a light bar) for left and right eyes. power spectrum23'24; the use of broadband filters These four simple cells are grouped into two pairs, to approximate the power spectrum would blur with members of a pair having receptive fields that the disparity signal to a degree incompatible with differ in spatial phase by 90°; the two cells comprising measured human stereoacuity. Moreover, the cepstral a pair are mutually inhibitory. Thus, for instance, a algorithm can extract only one disparity value per simple cell maximally responsive to a light bar would ocular dominance pair, which themselves subtend inhibit, and be inhibited by, a simple cell responsive to roughly ten rain of arc at the fovea. Because of a dark bar; the two cells comprising the other pair this limitation, the cepstral algorithm cannot correctly would respond to edges (one cell being responsive to analyse stereo targets depicting transparent depth a light-to-dark edge and the other to a dark-to-light planes 2°'2s, as these targets would create multiple edge). The outputs of these four simple cells are disparity signals within a given local pair of ocular squared (to compensate for the absence of spondominance columns. In fairness, however, we should taneous activity in simple cells) and then summed by reiterate that Yeshurun and Schwartz offered this the complex cell. With this arrangement, the complex algorithm for solution of stereo problems in machine cell receiving inputs from these four simple cells vision, not biological vision. responds maximally when a bar or an edge of the 450
TINS, Vol. 14, No. 10, 1991
same contrast to the two eyes forms an image at a given disparity anywhere within the complex cell's receptive field. The model adequately predicts the measured disparity selectivity of a sample of phase-selective complex cells in cat visual cortex. The model predicts, however, that these disparity-specific complex cells should respond vigorously to opposite contrast bars presented to the two eyes at a disparity different from the preferred disparity measured with matched contrasts. Such behavior would imply that these complex cells cannot uniquely signal a given retinal disparity. Nor can these complex cells signal disparities beyond the quarter-cycle limit of the input simple cells. In its present form, the model makes no provision for interactions across spatial scale.
Concluding r e m a r k s The models outlined here focus primarily on two problems of stereopsis - solution of the false-target problem and disparity computation. Cooperativity and coarse-to-fine processing, neither of which has been studied neurophysiologically, are two computational strategies that have achieved some success in solving these problems. It should be possible, however, to discover whether either strategy is biologically implemented. For example, coarse-to-fine strategies predict that the size of the preferred disparity of a cell should be directly related to the size of its receptive field. Moreover, the response of such a disparityselective neuron should be altered by the presence of spatial frequencies below its passband. Cooperative interactions across space might be revealed by changes in the response to a given disparity as a function of the disparity of neighboring stimulus elements. There exist several problems as yet unsolved by models of human stereopsis. These include the derivation of accurate measures of three-dimensional depth among objects in the visual scene from retinal disparity information that is inherently ambiguous. A given disparity may be associated with a range of actual depth intervals between objects (see caption to Fig. 1), and in order to scale disparity for perceived depth, additional distance information is required. How various sources of visual information are combined to yield veridical depth perception remains unanswered, although suggestions have been offered34'as. In addition, most models of stereopsis fail to account for stereopsis from disparities too large to yield single vision, which is readily observed. Particularly, models in which binocular AND units register disparity simply cannot handle stereopsis from diplopic images. In fact, most binocular cortical neurons respond to monocular stimulation in a fashion inconsistent with a logical AND operation 2-4. Also largely ignored by these models is binocular rivalry, the visual alternation caused by dissimilar stimulation of correspending retinal areas. Because rivalry is a response of the stereo system to aberrant stimulation, biologically .plausible models should generate rivalry under those stimulus conditions 36. Finally, modellers and neurophysiologists have worked under the assumption that stereoscopic computations are based directly on positional disparities among local stimulus features. However, it has been claimed that pure temporal disparity information, in TINS, VoL 14, No. 10, 1991
the absence of positional disparity, creates a sensation of stereopsis 37. Moreover, some psychophysical evidence as suggests that disparity or depth curvature (i.e. the second spatial derivative of disparity, in contrast to the disparity gradient 18) might provide the basis for perception of three-dimensional surfaces. Although there would certainly be no advantage for visual processing to solve the false-target problem first and then differentiate the result (retaining only the second derivative), disparity curvature can be directly computed by comparing contour curvatures in the two monocular images. Disparity curvature and temporal disparity processing certainly warrant further exploration, both theoretically and physiologically. So, computational models have not completely solved the puzzle of human stereopsis. These models have, however, sharpened our thinking about the nature of the problem and about possible solutions. Specifically, they have introduced the notions of cooperativity, interactions across spatial scales and the false-target problem. With these concepts, neurophysiologists now have a framework to guide their explorations of the hardware responsible for stereopsis and binocular single vision.
Selected references 1 Julesz, B. (1971) Foundations of Cyclopean Perception, University of Chicago Press 2 Hubel, D. H. and Wiesel, T. N. (1962) J. Physiol. 160, 106-154 3 Barlow, H. B., Blakemore, C. and Pettigrew, J. D. (1967) J. PhysioL 193, 327-342 4 Pettigrew, J. D., Nikara, T. and Bishop, P. O. (1968) Exp. Brain Res. 6, 391-410 5 Blake, R. and Hirsch, H. V. B. (1975) Science 190, 1114-1116 6 Crawford, M. L. J., Smith, E. L., Harwerth, R. S. and von Noorden, G. K. (1984) Invest. Ophthalmol. Visual Sci. 25, 779-781 7 Poggio, G. F. and Talbot, W. H. (1981) J. PhysioL 315, 469-492 8 Ferster, D. (1981)J. Physiol. 311,623-655 9 Maunsell, J. H. R. and Van Essen, D. C. (1983) J. Neurophysiol. 49, 1148-1167 10 Poggio, G. F. and Poggio, T. (1984) Annu. Rev. Neurosci. 7, 379-412 11 Dev, P. (1975) InL J. Man-A4ach. Stud. 7, 511-528 12 Nelson, J. I. (1975) J. Theor. Biol. 49, 1-88 13 Mart, D. and Poggio, T. (1976) Science 194, 283-287 14 Marr, D., Palm, G. and Poggio, T. (1978) Biol. Cybern. 28, 223-239 15 Sperling, G. (1970) Am. ]. Psychol. 83,461-534 16 Mayhew, J. E. W. and Frisby, J. P. (1980) Perception 9, 69-86 17 Marr, D. and Hildreth, E. (1980) Proc. R. Soc. London Ser. B 207, 187-217 18 Pollard, S. B., Mayhew, J. E. W. and Frisby, J. P. (1985) Perception 14, 449-470 19 Grossberg, S. and Marshall, J. A. (1989) Neural Networks 2, 29-51 20 Akerstrom, R. A. and Todd, J. T. (1988) Percept. Psychophys. 44, 421-432 21 Parker, A. J. and Yang, Y. (1989) Vision Res. 29, 1525-1538 22 Marr, D. and Poggio, T. (1979) Proc. R. Soc. London Ser. B 204, 301-328 23 DeValois, R. and DeValois, K. (1988) Spatial Vision, Oxford University Press 24 Wilson, H. R., Levi, D., Maffei, L., Rovamo, J. a,~l DeValois, R. (1990) in Visual Perception: The Neon%physiological Foundations (Spillman, L. and Werner, J. S., eds), pp. 231-272, Academic Press 25 Grimson, W. E. L. (1981) From Images to Surfaces, MIT Press 26 Mowfor[h, P., Mayhew, J. E. W. and Frisby, J. P. (1981) Perception 10, 299-304 27 Schor, C. M., Wood, I. and Ogawa, J. (1984) Vision Res. 24, 661-665
Acknowledgements Supportedby grants from NIH (EY07750 and EY02158). This reviewis dedicatedto BelaJulesz,whose developmentof the random-dot stereogram profoundlychanged the way visual scientiststhink about stereopsis.Andrew Parkerfumished usefulcommentson an earlier draft. 451
28 Wilson, H. R., Blake, R. and Halpern, D. L. (1991) J. Opt, Soc. Am. 8, 229-236 29 Nishihara, H. K. (1984) Opt. Eng. 23, 536-545 30 Quam, L. H. (1984) in Readingsin Computer Vision (Fischler, M. A. and Firschein, 0., eds), pp. 80-86, Kauffman 31 Anderson, C. H. and van Essen, D. C. (1987) Proc. NatlAcad. Sci. USA 84, 6297-6301 32 Yeshurun, Y. and Schwartz, E. L. (1989) IEEETrans. Pattern Anal. Mach. Intell. 11,759-767
33 Ohzawa, I., DeAngelis, G. C. and Freeman, R. D. (1990) Science 249, 1037-1041 34 Mayhew, J. E. W. and Longuet-Higgins, H. C. (1982) Nature 297, 376-379 35 Pouget, A. and Sejnowski, T. J. (1990) Invest. Ophthalmol. Visual Sci. 31, 96 36 Blake, R. (1989) Psychol. Rev. 96, 145-167 37 Ross, J. (1974) Nature 248, 363-364 38 Rogers, B. and Cagenello, R. (1989) Nature 339, 135-137
Autoimmunityto glutamRaciddecarboxylase(GAD)in StiffMan syndromeand insulin-dependentdiabetesmellitus Michele Solimena and Pietro De Camilli
MicheleSolimenaand PietroDe Camil/iare at the Deptof Cell Biology, Yale UniversitySchoolof Medicine,333 Cedar Street, New Haven, CT06510, USA.
Stiff-Man syndrome (SMS) is a disorder of the CATS, characterized by rigidity of the body musculature, which has been hypothesized to result from an impairment of GABAergic neurotransmission. GABA is the main inhibitory neurotransmitter of the brain. It is also a putative signal molecule in the pancreas, where it is produced by ~ cells (insulin-secreting cells) - the autoimmune target in insulin-dependent diabetes mellitus (IDDM). Autoantibodies to the GABAsynthesizing enzyme glutamic acid decarboxylase (GAD) have been found in SMS and in IDDM. This review summarizes evidence suggesting that SMS may be an autoimmune disease and discusses the possible significance of the autoimmune response to GAD in SMS and IDDM. Until recently, it was thought that anatomical barriers and functional properties normally protect neurons of the CNS from becoming the target of autoimmunity. Now, however, increasing evidence suggests that this is not the case. Paraneoplastic neurological disorders were the first neuronal diseases of the CNS for which an autoimmune origin was proposed 1. Evidence supporting an autoimmune pathogenesis has now been provided for a rare and severe neuronal CNS disease, Stiff-Man syndrome (SMS).
contraction of antagonist muscles. It was proposed that an imbalance between excitatory (catecholaminergic), and inhibitory (GABAergic) pathways controlling a-motoneuron activity causes the manifestations of the disease 8. High doses of various agonists of GABAergic neurotransmission, such as baclofen, sodium valproate and primarily benzodiazepines, are generally effective in ameliorating rigidity by reducing a-motoneuron firing. These drugs, however, do not affect the course of the disease s. More recently, steroid treatment has been shown to be effective in some cases 3'9-u. The few autopsies carried out so far have failed to demonstrate pathological lesions of specific areas of the CNS. However, evidence for an inflammatory process in the spinal cord and brainstem has been reported in some cases (Ref. 6 and Refs listed there).
The autoimmune hypothesis of SMS An autoimmune pathogenesis of SMS was suggested by the observation of sporadic cases in which this condition was associated with autoimmune diseases including IDDM 12'13'~7. IDDM is due to an autoimmune destruction of insulin-secreting [~ cells of pancreatic islets 1< 15. The study of one SMS patient 13, and subsequently of an additional 32 patients 16 (who fitted the criteria for the diagnosis of the disease established by Gordon et al. 5,7) addressed directly the Stiff-Man syndrome SMS was originally described by Moersch and possibility that CNS autoimmunity was involved in Woltman in 1956, who reported 14 cases observed at SMS. In many of these patients, levels of IgG the Mayo Clinic2. Since then, more than a hundred antibodies in the cerebrospinal fluid (CSF) were cases of the disease have been reported in the elevated or had an oligoclonal pattern or both, literature. SMS, also recently referred to as Stiff- suggesting that IgG antibodies were produced locally Person syndrome3, is characterized by rigidity of within the blood-brain barrier. Furthermore, in 60% skeletal muscles, primarily of the trunk and limbs, of these patients, autoantibodies directed against with superimposed painful spasms 3-5. It usually GABAergic neurons, and primarily their nerve terappears in adulthood and generally has a fluctuating, minals, were identified by an immunocytochemical slowly progressive course. In a few cases, sudden assay in the serum and the available CSFs (Fig. 1). By death has been reported 6. western blotting13'16 and immunoprecipitation16"'1 7, SMS resembles a chronic form of tetanus in many the dominant autoantigen recognized by these autoways, although some differences, such as the absence antibodies in brain tissue was found to be the GABAof trismus in SMS, differentiate the two conditions4. synthesizing enzyme, glutamic acid decarboxylase Symptoms result from the simultaneous activation of (GAD) (Figs 2,3). Sera and CSFs found to be negative agonist and antagonist muscles. Several character- by immunocytochemistry were also found to be istics of the disease indicate its CNS origin5'7. negative for GAD antibodies by western blotting and Neurophysiological studies have demonstrated the immunoprecipitation. A few sera were found to be presence of a continuous discharge of motor-unit positive by immunocytochemistry and immunoprecipipotentials resembling a normal voluntary contraction. tation but negative by western blotting 16' 18. This can This activity cannot be inhibited by the voluntary probably be explained by the variable specificity of 452
© 1991.ElsevierSciencePublishersLtd,(UK) 0166-2236/91/$02.00
TINS, Vo/. 14, No. 10, 1991