Vision Res. Vol. 31, No. 6, pp. 1039-1051, 1991 Printed in Great Britain. All rights reserved
0042-6989/91 $3.00+ 0.00 Copyright© 1991PergamonPressplc
CONDITIONS OF
FOR
THE
COHERENT
DETECTION
MOTION
A. V. VAN DEN BERG and W. A. VANDE GRIND Laboratory of Comparative Physiology, State University of Utrecht, Padualaan 8, 3584CH Utrecht, The Netherlands (Received 5 December 1989; in revised form 15 April 1990)
A~traet--Previous studies have suggested that human motion perception involves at least two different detection stages: an orientation selective or component motion stage and a combination stage where selectivity to the coherent motion of a pattern, e.g. a plaid is established. These studies are inconclusive as to the motion detection process per se. Here we provide evidence that the motion detectors involved are of the correlation type. We determined critical values for the temporal and spatial modulation of the stimulus structure where the motion of the stimulus is no longer visible. Our results indicate that the spatial parameters of plaid- and component-motion detectors are identical but that the critical temporal modulation period is shorter for plaid-motion detectors. The results are discussed with respect to recent neurophysiological and psychophysicalevidence for the two-stage motion detection model. Motion perception motion
Motion detectors
Spatiotemporalcorrelation
INTRODUCTION Primate vision appears to encode motion in at least two stages. An orientation selective local motion detection stage feeds a combination stage, where sensitivity to the coherent motion of a multi-oriented pattern is established (Adelson & Movshon, 1982). This paper addresses two questions that have received little attention. W h a t motion detection mechanism is involved and are the spatio-temporal properties of the two detection stages identical? In the context of local motion detection the "aperture problem" (Ullman, 1983; Hildreth, 1984) asserts that a local motion detector cannot fully specify the local velocity of the image but merely signals the magnitude of the motion perpendicular to the local contour (the "component velocity"). As pointed out by Reichardt and Egelhaaf (1988) and by Uras, Girosi, Verri and Torre (1989) the problem arises because the information in the time varying image is only partially taken into account. The aperture problem arises because the moving image is conceptually replaced by straight local line-segments or local edges. Thus, all information in the higher (than first and second) order and mixed spatial derivatives is neglected from the outset. When the full information in the image is taken into account the aperture " p r o b l e m " vanishes except
Plaidmotion
Component
for some special conditions, e.g. when the image only contains a set of parallel lines, The structure of the front-end of the visual system, as described in the neuro-physiological literature might tempt one to regard a model consisting of a set of relatively independent sampling apertures as a valid approximation. However, in contradistinction to such a model it has been shown that suitable combinations of "edge" and "line" detectors may perfectly signal the local line or edge curvature in an image (Koenderink & van Doorn, 1987). Investigation of the ways by which the visual system could "solve" the aperture problem has provided some useful techniques to probe the functional structure of the motion detection system. A well known solution is the velocityspace construction rule. This scheme, originated by Fennema and T h o m p s o n (1979) requires two local readings of the velocity where the pattern contours are of different orientation. A single reading in one aperture constrains only the motion component perpendicular to the local orientation whereas the component parallel to the contour m a y be of any magnitude. Such a reading provides the component motion perpendicular to the local contour and creates a "constraint line" in velocity space parallel to the orientation of the local contour. A second reading creates a differently directed
1039
1040
A . V . VAN DEN BERG a n d W. A. VAN DE GRIND
constraint line if the contours are oriented differently. The intersection of the two constraint lines provides the velocity of the pattern. The resultant pattern motion is at least as large as the component motion. This solution appears to give an apt description of the direction of the coherent motion of a plaid which is composed of two superimposed gratings of different orientation moving in a direction perpendicular to their line-orientations (Adelson & Movshon, 1982). Subjects preferentially perceive the motion of the plaid rather than the independent motions of the component gratings which compose the plaid when they look at such a stimulus through a circular aperture, provided that the components are of similar contrast and spatial frequency. Apparently, the visual system combines the component-motion signals as in the Fennema and Thompson scheme. Alternatively, motion of pattern-features like the lineintersections determines the perceived motion. Adelson and Movshon (1982) and Welch (1989) have argued in favour of the first possibility. They inferred the existence of two motion processing stages in human vision. The first stage is orientation selective and it encodes the component motion. The second stage combines the outputs of the first stage and is responsible for selective sensitivity to the plaid motion direction. Subsequent neurophysiological studies in the monkey cortical area MT have provided evidence for at least two different classes of motion sensitive cells: component motion selective and pattern (or plaid) motion selective units (Movshon, Adelson, Gizzi & Newsome, 1985; Rodman & Albright, 1989). None of these studies permits definite conclusions as to the motion detection mechanism per se of either the component- or the pattern-motion stage. Our first aim is then to provide data relevant to this question. Several studies indicate that human motion detection is mediated by spatio-temporal cross-correlation (van D o o r n & Koenderink, 1982a, b; van Santen & Sperling, 1984; van de Grind, Koenderink & van Doom, 1986). In this scheme motion is detected when the correlation between the activity induced by the time-varying illuminance at two nearby locations in the visual field exceeds some fiducial level. The output of one of these so-called subunits is delayed prior to the correlation. The correlation is maximum if the stimulus moves from the first subunit (the one with the delay) to the second subunit during
the delay time. Hence, there exists a simple relation between the preferred velocity and the structure of the detector. The detector is tuned to a velocity which equals the ratio of the span (S; the centre-to-centre distance between the subunits) and the delay (z). Such a bilocal detector does not signal the magnitude of the velocity but the probability that its preferred stimulus velocity has occurred. Consequently, the range of locally detectable motions is encoded by a set of differently tuned motion detectors (labelled-line code). Conversely, a particular stimulus motion results in a characteristic activation pattern in the local set of motion detectors. Notice that the tuning velocity neither constrains the span nor the delay but only their ratio. In principle a large set of detectors with different delays and spans can be tuned to the same stimulus velocity. However, prior work has established that the visual system appears to favour particular combinations of span and delay. The relations between the detector parameters (S, z) and the preferred velocity for moving Julesz patterns, both at the fovea and at several eccentricities (van de Grind et al., 1986), have been described previously. Using a different paradigm van den Berg, van de Grind and van Doorn (1990) confirmed these relations for the fovea and found that bilocal motion detectors are sensitive to changes in orientation of the target during motion. Combining these different findings it seems plausible that the orientation selective component motion detectors proposed by Adelson and Movshon are bilocal. If so, it seems of interest to ask whether the bilocality of the component motion detectors carries over to the pattern motion stage and whether the spatiotemporal properties of pattern and the component motion detectors are identical. For single line patterns which stimulate component motion detectors tuned to the velocity component perpendicular to the lines we expect to find the same relation between the (bilocal) detector parameters and stimulus velocity as previously described by van Doorn and Koenderink (1982). For plaid motion, however, some of these relations may be shifted when bilocal detectors tuned to the component motion are involved rather than detectors tuned to the plaid motion per se (e.g. to the motion of the intersections). This is a consequence of the aforesaid difference between the motion of the plaid as a whole and the motion of its component line patterns. In this study we test this
Conditions for the detection of coherent motion
idea and establish that the component detectors might be of the bilocal type. The findings are compatible with the idea of a two-stage analysis and will be related to previous psychophysical and electrophysiological results bearing on the model. METHODS
General
For a bilocal detector to signal motion the correlation between the signals from its two subunits must exceed some threshold. Suppose we manipulate the spatio-temporal properties of our stimuli in such a way as to decorrelate the direct and delayed inputs of the bilocal correlator. We may then present the preferred motion to either subunit and nevertheless preclude an appreciable signal of the bilocal motion detector. This condition will result in poor motion detection if any. Van Doorn and Koenderink (1982) designed two elegant methods to achieve decorrelation. They presented two uncorrelated random-dot patterns moving at equal speed in opposite directions. On the assumption that the detector's structure does not depend on its preferred direction of motion such a stimulus may be used to reveal the properties of the detector tuned t o the speed of the patterns. To investigate the span of the bilocal detector tuned to this speed the patterns appear simultaneously on a display in alternate bar-shaped windows. Thus, pixels which move rightward out of, say, the first bar disappear from the display to reappear in the third bar after they have crossed the width of the second bar where the motion of the alternate pattern is shown. In the sequel we will use "bar" as a shorthand for bar-shaped windows to indicate the spatial arrangement of the display. We can describe such a display therefore as showing two oppositely moving patterns which occlude each other in alternate bars (spatial alternation; see Fig. I). The bars are always oriented perpendicular to the direction of motion. The value of the span can be deduced from the barwidth at the limit between motion detection and its disappearance. To see this, consider first a condition where the barwidth exceeds the span considerably. In this case, each bar will cover several detectors. Hence, the opposite motion in adjacent bars will be encoded by the local motion detectors. The bar-structure of the display is visible. At the other extreme, when the bars are very small
III A
1041
B
Spatial @ alternation
Temporal or alternation T T
,d 'Plaid'
Fig. 1. A stimulus consists of a pair of simple line-patterns (upper part) or plaids (lower part). The elements of the pair always move in opposite directions. The pattern structures of the pair (A,B) differ, i.e. the line intervals are randomized differently. Plaids are composed of a superposition of two simple line-patterns oriented orthogonally. Each component of the plaid moves in a direction perpendicular to the lineorientation. The elements of the pair are shown simultaneously but in alternate bars when the span is determined (spatial alternation). The bars are oriented perpendicular to the motion of either pattern. For measurement of the delays of the motion detectors the stimulus pair is shown in temporal alternation.
with respect to the span, the receptive fields of most subunits will include one or more borders between the bars. Therefore, most subunits will receive a mixture of inputs of both--oppositely moving--patterns. But this implies that, irrespective of the preferred horizontal motion direction of the correlator, its inputs will be partially correlated. Hence, opposite motion is signalled simultaneously at one location: transparent motion. For intermediate barwidth, i.e. about equal to the span, an increasing number of detectors will receive inputs from subunits which are directed at adjacent bars. Because these bars contain uncorrelated patterns the signals of these subunit pairs are uncorrelated, irrespective of the preferred motion direction of the bilocal detector. Hence, weak or no motion is signalled in this condition. A completely analogous "alternation" is possible in the time-domain. It exploits the temporal asymmetry of the bilocal cooperation. When the two patterns are presented in temporal alternation with a presentation time slightly smaller than the delay, the momentary inputs of the correlator stage of the detector are always derived from different patterns and
1042
A . V . VAN DEN BERG and W. A. VAN DE GRIND
hence are decorrelated. On the other hand, when the alternation period is much longer than the delay, different detectors tuned to opposite motion will be stimulated in successive intervals. For very brief alternation periods each subunit signal is again a mixture of the input of both patterns. Hence, its output is partially correlated with the output of any other subunit which samples the motion one delay period later at a location one span up ahead in either motion direction: transparent motion. Van Doorn and Koenderink (1982a, b) applied these methods to determine the span and the delay of human motion detectors. They quantified the detectability of the motion with a mixture of coherently moving Julesz patterns (signals) and a non-moving dynamic Julesz pattern (noise) which was refreshed every 10 msec. The contrast ratio of signal and noise (SNR) was reduced to find the threshold for motion detection while mean contrast was held constant. Threshold SNR showed similar behaviour for variations of the spatial (HI) and temporal (T) alternation period. For an intermediate value of T or W, threshold SNR reached a peak level with a steep decline at both lower and higher values of T or IV. The values where SNR peaked as a function of T (IV) were interpreted as the delay (span) of the detectors tuned to the stimulus speed. Different percepts were reported for three different regions of the SNR (IV) and SNR (T) functions. For values of T (W) far in excess of the delay (span) a clear rocking motion (bars) was seen (alternation). Coherent motion was difficult or impossible to see at the critical values (Tc or We) where the SNR(T) or the SNR(W) function peaked (non-coherence). For lower values of T and IV the two patterns were seen as spatially and temporally coincident but moving in opposite directions (transparency). We used the above paradigm to investigate whether motion detectors involved in the perception of coherent plaid motion are of the bilocal type. To this end we used line patterns with randomized line-intervals (see Fig. 1) rather than Julesz patterns. Span and delay of motion detectors were measured with single line patterns (vertical lines) and with plaids (a superposition of two perpendicular line patterns). In either case two patterns with uncorrelated randomized line-intervals were alternated either within the bars of the spatial windows or temporally. Pattern motion (i.e. coherent motion of the plaid or the motion of the single line pattern) was always perpendicular to the bars if the
spatial alternation paradigm was used. The alternating patterns moved with velocities of identical magnitude but in opposite directions. It was not sensible to devise an analog of the SNR method. One reason is that the contrast ratio of the dynamic noise pattern and the moving lines-pattern is not linearly related to the ratio of the detectability of either pattern in isolation, because the signal and the noise differ in structure. A second reason is that our microcomputer is not able to generate the required variably mixed patterns at sufficient speed. Instead we asked the subjects t o classify the perceived motion according to either of the following three categories; (1) transparency, (2) alternation or (3) a "jumble" if the pattern resembled a quivering bunch of sticks and no coherent motion was seen (neither transparency nor alternation).
Stimuli We varied the temporal and the spatial interval of alternation over more than a decade for a range of velociites from 0.25 to 4 deg/sec). The spatial interval (window bar-width) was varied in steps of 1.2 min arc and ranged from 0.6 to 17.4rain arc. The temporal alternation interval was varied in 20 msec steps from 20 to 200 msec. All stimuli were generated and displayed on an Amiga 1000 microcomputer with a framerate of 50 Hz. With the exception of the highest velocity the pattern-lines were 1.8 min arc wide. The line-interval was randomized with a uniform distribution between 1.8 and 9 min arc. For the highest velocities we used a line pattern with identical line width but with intervals randomized between 1.8 and 13.8 min arc. Unless stated otherwise the stimuli were viewed through a circular aperture of 1.2 deg diameter. To study the spatio-temporal properties of plaid motion detectors, plaids with perpendicular line patterns were used. The line width and intervals were chosen as for the single line patterns. The components of the plaids were oriented horizontally and vertically with the monitor in the upright position. Hence, plaid motion was oriented along a diagonal. During the spatial alternation the bar shaped windows were oriented perpendicular to this diagonal. In some sessions (as shown at the bottom of Fig. 1) the monitor was tilted by 45 deg to create horizontal plaid motion. No systematic differences were found related to the direction of the plaid motion.
Conditions for the detection of coherent motion At the intersections of the plaid luminance was unchanged. During the spatial alternation the bars break up the plaid in line segments and crosses. As a result energy is found at orientations other than the line orientations which comprise the plaid. A 2D Fourier transform (Press, Flannery, Teukolsky & Vettering, 1988) of this stimulus revealed that more than 50% o f the total power in the spectrum is found at the orientations of the components, irrespective o f the width of the occluding bars. The spectrum always peaked at one of the principal orientations. At other orientations the power of any spectral component was no more than 25% of this maximum, but usually much less. Apparent motion of a stimulus component resulted from jumps of n pixels every kth frame (frame duration: 20 msec). Pixels were rectangular with dimensions 0.6 x 1.2 min arc (hor. x ver.) at the viewing distance of 2.3 m. Thus, identical speeds in pixels per frame resulted in twice as large an angular velocity for vertical as for horizontal motion. Table 1 summarizes the stimulus velocities used. The following gives a brief description of how the stimuli were generated. The stimulus generation is depicted schematically in Fig. 1. All stimulus components (line patterns) exceed the dimensions o f the circular aperture and are stored in memory. Each stimulus component is displaced and wrapped around in the background and subsequently copied to one part of a twin buffer area. In case of the spatial alternation paradigm the components of the pattern are copied through bar patterns (masks) which were also stored in memory. In this particular mode the copy affects the display only where the mask is true (transparent). All components of the stimulus are combined in the buffer area before the result is finally displayed. The twin buffer area not on display is then used to build the next frame. The computer performs the sequence of displacements, copies and buffer switching within 20 msec.
Subjects and procedures A full data set w a s obtained from two subjects and a partial data set was obtained from a third subject. The main subjects were one of the authors (AB) and one naive subject (ML). AB, a male myope ( - 3 D ) and ML, a male emmetrope had normal or corrected to normal visual acuity. Experiments were done in a dark room where only the glare of the monitor was visible. The head was supported
1043
Table 1. Stimulus velocities of the line patterns Component velocity Pattern u v velocity Pix/frame Deg/sec Pix/frame Deg/sec (deg/sec) 1/2 0.25 --0.25 1/1 0.49 --0.49 2/1 0.98 --0.98 4/1 1.95 --1.95 8/1 3.95 --3.95 1/2 0.25 1/4 0.25 0.35 1/1 0.49 1/2 0.50 0.70 2/1 0.98 I/1 1.0 1.4 4/1 1.95 2/1 2.0 2.8 Vertical lines moved horizontally (u), horizontal lines moved vertically(v). Motion is specifiedby the jumpsize (pixels) and the temporal interval between jumps in frames or as deg/sec. Pattern velocityequals the velocity of the component for single line-patterns (upper part of the table). For plaids, pattern motion is computed from the component velocitiesaccording to the velocityspace construction rule. by a chinrest and the left eye was covered. A 19min arc fixation annulus was provided at the centre of the display. Presentation time was determined by the subject. The stimulus presentation was terminated when the subject classified the stimulus in one o f the aforementioned categories. In the first set of experiments the temporal properties of component and plaid motion detectors were investigated by measuring the probability of the "jumble"-report as a function of the alternation period. Probabilities are based on 48 trials. The weighted average of these data was used as an estimate of the critical alternation period (T¢) or the delay
y, n,r, To= i ni i
Ti = alternation period ni = number of trials with "jumble". In a second series of experiments a modified staircase method is used. Whenever the subject reports "alternation" (or "bars" in the spatial case) the alternation period (barwidth) is decreased. On report of "transparency" the alternation period (barwidth) is increased. When "jumble" is reported the change of the alternation period (barwidth) is identical to the change during the previous step. Thus "jumble" reports following a "transparency" report result in an increase of the parameter. However, following "alternation" the parameter is decreased on subsequent "jumble" reports. Hence, the
(a)
ML 1.0 deg/sec
O. 4 9 deg/sec
0 . 2 5 deg/sec
2 . 0 deg/sec
4 . 0 deg/sec
II I
200 180
160
II
140
'~
1;:'0
E 100 I.80
II
6O
m
40 20 i
o
0
lOO
0 . 2 5 deg/sec
1oo
I O0
o
1oo
AB 1.0 deg/sec
0 . 4 9 deg/sec
o
lOO
2.0 deg/sec
4.0 deg~ec
200
|
180 160 140
120 E t--
100
II
80
I
60
II
40 20 Ioo
(b)
o
100
1oo
0
100
0
100
1000
AB
100
10
. . . . . . .
'
.
.
.
.
.
, l
1000 ML
100
10
. . . . . . . .
0.1
i
1 v (deg/sec)
lO
Fig. 2. (a) Percentage "'jumble" reports (abscissa) at various temporal alternation periods (T), for two subjects (AB, ML). The peak of the distribution shifts to lower alternation periods for faster stimulus motion. (b) The critical alternation period (To) as a function of the pattern velocity. For plaids the pattern velocity exceeds the component velocity by a factor of x/2. For the simple line-stimuli, pattern velocity equals the component velocity. Open symbols represent Tc for simple line-stimuli. Solid symbols indicate T¢ for plaids. The dotted lines with crosses indicates Tc as a function of the velocity of the simple line-pattern using the staircase method.
Conditions for the detection of coherent motion
parameter domain where a jumble is reported is sampled more efficiently. The mean of 16 reversal levels is used as an estimate of the critical barwidth (We) or span. In several sessions on different days either plaids or single lines were shown. Stimuli were presented in blocks with constant velocity. The order of the blocks was randomly varied from day to day. In the first set of experiments the alternation period was presented in random order within a block. RESULTS
Temporal alternation of line-patterns At every velocity there exists a range of alternation periods where no coherent motion of single lines is seen. Clear transparency occurs at a short alternation period, whereas at very long alternation periods a line-pattern seems to rock to and fro. Figure 2a shows the frequency of incoherence (jumble report) as a function of the temporal alternation period for a range of velocities. Especially at the lower velocities no coherent motion is seen for a rather broad range of alternation periods. Notice that percentage "jumble" values do not reach 100% in most histograms. Thus there was nearly always some probability that the subject reported coherent motion at any particular alternation period. For example, at an intermediate velocity of 1 deg/sec and an alternation period of 80 msec, the frequency of incoherence is about 0.6 for subject ML. This means that the percept of coherent motion disappeared in slightly more than half of the sessions where this particular velocity was investigated. In the other sessions a sharp transition occurred from transparency reports to alternation reports at 80 or 100 msec alternation periods. Unfortunately we could not probe this transition region with a higher temporal resolution. Due to the fixed framerate of the computer the minimum stepsize for the alternation period was limited to 20 msec. The data of sessions where the transition from transparency to alternation or v.v. occurs within one step of the alternation period are essentially neglected when only the percentage "jumble" reports are analyzed. This neglect of useful data was the main reason for the use of a staircase method in subsequent experiments. Figure 2b shows the critical alternation periods for the same subjects as a function of pattern velocity. For the single line-patterns VR 31/6---1
1045
pattern-velocity equals component velocity. Tc decreases from about 130 msec at a velocity of 0.25 deg/sec to 40 msec at 2 deg/sec. The results for the staircase method are identical, as shown for subject AB (dotted line Fig. 2b).
Temporal alternation of plaids For slow temporal alternations of oppositely moving plaids a single plaid is seen to rock to and fro. This is remarkable because when the plaids are shown side by side on the display the difference between the patterns is readily seen. Only on close inspection of the moving plaids (making pursuit eye movements) can the difference between the patterns be noticed. For very brief alternation periods transparent motion of two plaids is seen in the two opposite plaid motion directions. When the alternation period increases either one of the component motions dominates, or a jumble of horizontal and vertical lines is perceived. Both percepts are classified as "jumble". For larger alternation periods the rocking motion occurs. Figure 2b shows Tc for the moving plaids as a function of the plaid velocity (solid circles). Interestingly, the critical alternation period is lower for the plaid than for motion of single line patterns. The difference would even be larger if Tc were plotted as a function of the velocity of the line components of the plaids. When plotted as a function of component motion the curves for the plaids shift leftward because component motion is lower than the pattern motion by a factor l/x/2. The curves for the motion of the single line pattern are unaltered because component motion equals pattern motion for single line patterns.
Spatial alternation of line-patterns and plaids For moving plaids the critical width measurement was complicated by an unexpected effect. When the barwidth was increased from its lowest values the percept of transparent motion in a direction perpendicular to the bars (of the mask) gave way to a percept of transparent motion parallel to the bars. For larger barwidth "jumble" or opposite motion in alternating bars (perpendicular to the bars) was seen. The change in the perceived direction of motion may be related to Wallach's "barberpole" effect. When observed through a rectangular aperture oblique lines which move perpendicular to their orientation are preferentially seen to move parallel to the long side of the aperture. For small barwidth of the
1046
A. V. VAN DEN BERG and W. A. VAN DE GRIND
mask the ratio of the barwidth and the diameter of the circular aperture could exceed 10-20. The consequent large length-to-width ratio of the occluding bars may have biased the perceived direction to motion parallel to the long side of the bars. To counter this effect we reduced the diameter of the aperture for lower velocities. Previous studies have shown that the span for motion detectors tuned to lower velocities is smaller. Thus the expected critical barwidth is smaller for lower velocities. The diameter was reduced in such a way (Table 2) that at the expected critical barwidth the ratio of barwidth to aperture diameter would be constant for all velocities. Subjects were instructed to classify transparent motion parallel to the bars as a "jumble". Subject ML could not make stable judgements of the perceived motion for the highest plaid velocity (2.8 deg/sec). Use of a larger aperture might have remedied this unability. However, the size of the stimulus could only be increased at the expense of the refresh rate. We did not want to compromise the 20 msec refresh rate as used in the rest of our study. Figure 3 shows the critical barwidth (We) of the mask where coherent motion was no longer seen, as a function of the pattern velocity. For both subjects the We-functions for motion of line-patterns and of plaids coincide.
Table 2. The aperture size (D) used for plaids in the spatial alternation experiment for different plaid velocities Pattern velocity (deg/sec)
D (deg)
0.35 0.70 1.4 2.8
0.6 0.6 0.85 1.2
the detector encodes the speed of the motion. Such models cannot cope with the "transparency" phenomenon in our experiment, because the opposite motions in the receptive field of the detector would tend to cancel out. The essential requirement to any model of transparency phenomena would seem that the motion encoding scheme includes separate local channels for each component of the transparent motion. Thus, bidirectional correlation schemes like the original model of Reichardt (1961) and its elaborations (van Santen & Sperling, 1984; Dawson & DiLollo, 1990), in which the output of a pair of oppositely directed correlators is subtracted, similarly fail to account for perception of transparent, opposite motion. The correlation-detectors proposed here are selec10
AB
f
DISCUSSION
We have investigated the mechanism of motion detection involved in the perception of coherently moving plaids using a paradigm designed by van Doorn and Koenderink (1982). Prior studies have suggested that plaid motion is encoded by a non-linear combination of the 1 activity of motion detectors which respond E 10 ¢ only to the motion component perpendicular to ML the local lines or edges (Movshon et al., 1985; Ferrera & Wilson, 1988; Welch, 1989). Thus two levels of organization are distinguished in the motion detection system: (1) an orientation selective "component motion" stage; (2) a combination stage where "plaid motion" is encoded. Here, we show that the component motion detectors are bilocal and of the correlation type. The predictions based on the bilocality assumption were corroborated. 1 The results argue against alternative local o.1 1 lO models like gradient schemes (Ullman, 1983). v (deg/see) In speedometer-type of movement detectors, Fig. 3. Critical barwidth (We) as a function of the pattern like the gradient model, the signal strength of velocity for two subjects (AB, ML). Symbols as in Fig. 2(b). O
f
.
.
.
.
.
.
,,I
,
.
.
.
.
.
.
.
Conditions for the detection of coherent motion
tively sensitive to a narrow range of motions in one direction. Opposite motion directions activate different detectors. Hence, transparency phenomena are accounted for in this scheme as simultaneous, about equal activation of two differently tuned correlators. Simultaneous excitation of two channels may not be sufficient for understanding transparency phenomena. We would like to keep an open mind to the possibility that competition between differently tuned channels results in a single perceived motion direction if the ratio of the channel activity is large. We think that the unexpectedly perceived motion parallel to the bars in the spatial alternation experiment is consistent with such an interpretation. If the barwidth is on the order of the span the motion perpendicular to the bars evokes a weak response in its detectors. However, the motions of the components of each plaid are also compatible with motion in opposite directions along the bar. Hence, if the number of borders is large i.e. for alternation-bar width somewhat less than the span, a comparatively large number of detectors, signalling opposite motion in directions parallel to the bars, may be activated. On the other hand, the motion perpendicular to the bars provides a weak input because of the spatial arrangement of the stimulus. This may have caused a percept of transparant motion along the borders of the bars. Why then the reduction of the stimulus size for lower stimulus velocities? Previous studies have provided evidence for recruitment phenomena in motion perception (van D o o r n & Koenderink, 1984; McKee & Welch, 1985). Essentially, these studies showed that the increase of the stimulus detectability is larger when the stimulus size is increased in the motion direction than in a perpendicular direction. Hence, we expected the competing motion percepts along the bars and perpendicular to the bars to be influenced by the width-height ratio of the bars. To compare the span estimates across velocities it seemed therefore wise to choose an approximately constant width-height ratio at the bar width that equals expected span for each velocity. When two random line patterns of identical orientation but of different structure are shown in temporal or spatial alternation there exists a range of alternation periods in the spatial as well as the temporal domain where the percept of coherent motion disappears. The critical width and critical alternation period are mechanistically interpreted as the span and the delay of the
1047
bilocal motion detector tuned to the stimulus velocity. Thus, component motion detectors are characterized by their joint orientation and velocity (direction and speed) selectivity. The "bilocality" of the component motion detectors appears to carry over to the detectors involved in plaid motion perception as might be expected from the two stage model. For higher speeds the delay of the component motion detectors decreases and the span increases. The data are well described by a power law with respect to the velocity (deg/sec): AB: T = 73 (v) -°35 msec;
r z = 0.82
S = 4.7 (0) 0.36 min arc;
r 2 = 0.95
ML: T = 77 (v) -°'4 msec;
rz
S = 4.3 (v) °35 min arc;
= 0.9
r 2 = 0.92.
These results are in qualitative agreement with the data of van Doorn and Koenderink (1982). Our results are even in close quantitative agreement with their data as far as the delays are concerned. van D o o m and Koenderink (1982a, b) T = 89 (v) -°'4 msec S = 4.2 (v)°6 rain arc. We have no definite explanation for the nearly twofold lower power of the span as a function of the velocity in our data as compared to those of van D o o m and Koenderink. Possibly, it results from the higher probability of "false matches" for 2 different randomized line patterns as compared to a pair of uncorrelated Julesz patterns. It may be useful to consider this possibility in more detail. For a bilocal detector with span S the probability P that it is activated because both subunits are looking at the same moving pattern can be estimated. For a barwidth w it is approximately given by P(w;S) = (w-S)/w;
w > S.
Let us assume a simple detection requirement for movement in a bar-shaped region. Say, the activity of the population of motion detectors with at least one subunit in that region must exceed some fiducial level. False matches are possible for the detectors with one subunit directed at the bar with the first moving pattern and the other subunit directed at the neighbouring bar, which contains the other moving pattern. These false matches also contribute to
1048
A.V.
VAN DEN BERG a n d W . A . VAN DE G RI N D
the activity. This effect results in an apparent increase of P(w ;S) for a given barwidth. Hence, the barwidth where the percept of bars with opposite motion gives way to a percept of incoherent motion is lowered by the false matches and the span is underestimated. This effect must be stronger when P(w;S) is a shallower function of w, because the additional activity due to false matches then results in a larger apparent shift of S. Hence, the effect will be larger for detectors with larger spans; i.e. for higher stimulus velocities. We thus like to suggest that an increase of the probability of false matches may have caused an underestimation of the span of the detectors tuned to the higher speeds. For moving plaids the delays of the motion detectors as a function of the pattern velocity were smaller than when a single bar pattern was used. However, the spans were identical in both conditions. Because the tuning velocity of the detectors is determined by the ratio of the span and the delay, plaid motion in our experiment was apparently detected through channels tuned to slightly higher velocities. This is shown in Fig. 4. The plaids consisted of lines at _ 45 deg orientation with respect to the direction of the plaid motion. This result suggests that the tuning velocity of the bilocal component motion detectors which feed in the pattern motion detection stage depends on the orientation preference of the component detector. They are tuned to the velocity v when their preferred orientation is perpendicular to the direction of the pattern motion. However, when their orientation preference deviates from perpendicularity they appear to be tuned to a higher plaid velocity. This result nicely corresponds to neurophysiological data of Rodman and Albright (1987, 1989). These authors showed that a population of neurons in area MT of the monkey which were shown to be selectively sensitive to the direction of plaid motion possessed a characteristic speed/orientation tuning curve for bar motion. The optimum orientation for the bar was independent of its speed perpendicular to the bar and vice versa. Such a tuning curve can be understood in terms of a bundle of orientation selective bilocal motion detectors sensitive to the same direction of motion but with tuning speed dependent on the orientation preference. To clarify this point imagine a bar which moves perpendicular to its long side. Suppose,
AB
o o
i 1
i 2
2 L
, 0
i I
i 2
v(deg/sec) Fig. 4. Tuning velocity of the bilocal detectors as a function of the stimulus velocity for two subjects (AB, ML). Tuning velocity is computed from the ratio of the critical width (Fig. 3; span) and the critical temporal alternation period (Fig. 2; delay) at each pattern velocity. Open symbols indicate motion of the line-pattern, solid symbols refer to plaid motion. The tuning velocity equals the stimulus motion for line-patterns, but for the plaids the detectors are tuned to slightly higher stimulus speeds.
its motion direction is parallel to the axis of a bilocal detector which is not orientation selective. The preferred bar speed will then equal the ratio of the detector's span (S) and delay (x). When the orientation of the bar is changed over an angle ~0 and again moved in a direction perpendicular to its orientation the subunits of the bilocal detector will now be successively stimulated by different portions of the bar; to the moving bar the span of the detector seems to be "contracted" to a fraction of its original length (S', see Fig. 5). S' = S cos (~p). So, for optimal stimulation of the above bilocal detector the bar should move with a lower velocity (S'/r) perpendicular to its orientation. However, in Rodman and Albright's study the preferred speed perpendicular to the bar remained the same irrespective of its orien-
Conditions for the detection of coherent motion to
i
i
Fig. 5. For the bilocal detector (span: S, delay: 3) the optimal speed of line motion depends on the orientation (tp) of the line when its subunits are rotationally symmetrical. For optimal correlation, a line which crosses the first subunit at time to must cross the second subunit a delay time later. Thus, the displacement in a direction perpendicular to the line during this period must have equalled S cos (~0). To the line the detector's span appears contracted to S'. This implies that for a line oriented at an angle tp the tuning velocity of the detector is lowered by a factor cos(~0). tation. T o m o d e l this result, the s p a n o f the p l a i d m o t i o n d e t e c t o r when s t i m u l a t e d b y a b a r at an angle ~0 with respect to the d e t e c t o r ' s p r e f e r r e d direction o f m o t i o n m u s t be larger by a f a c t o r 1/cos(tp) or its d e l a y m u s t be smaller by cos(~0) a n d the bilocal d e t e c t o r m u s t be t u n e d to a higher velocity. T h e tuning curve c a n n o t reflect the o u t p u t o f a single bilocal d e t e c t o r w i t h o u t o r i e n t a t i o n selectivity. A n a l t e r n a t i v e m o d e l c o u l d include a b u n d l e o f o r i e n t a t i o n selective bilocal d e t e c t o r s ( " c o m p o n e n t m o t i o n d e t e c t o r s " ) with identical preferred direction o f m o t i o n (Fig. 6). C o m p o n e n t d e t e c t o r s in the b u n d l e are t u n e d to increasingly higher speeds w h e n the o r i e n t a t i o n preference o f the s u b u n i t s deviates m o r e f r o m the p e r p e n d i c u l a r to the preferred d i r e c t i o n o f m o t i o n . T h e response o f the b u n d l e m u s t be d o m i n a t e d b y the element with the best m a t c h ing o r i e n t a t i o n preference. F o r m u l a t e d differently, the b u n d l e consists o f c o m p o n e n t m o t i o n d e t e c t o r s all t u n e d to the same speed in a d i r e c t i o n p e r p e n d i c u l a r to their p r e f e r r e d orientation. T h e c o m b i n a t i o n stage ( " p l a i d ,motion d e t e c t o r " ) w o u l d then c o r r e s p o n d to the weighted sum over the elements o f the bundle. O u r p r e s e n t findings suggest that the spans o f the different c o m p o n e n t s are identical b u t t h a t the delays differ d e p e n d i n g on the p r e f e r r e d o r i e n t a t i o n o f the c o m p o n e n t detector. T h e d o m i n a n c e o f the c o m p o n e n t - d e t e c t o r with the best m a t c h i n g o r i e n t a t i o n preference
1049
requires some f o r m o f o r i e n t a t i o n t u n i n g o f the subunits. Evidence for o r i e n t a t i o n selectivity o f bilocal m o t i o n d e t e c t o r s was given b y v a n den Berg et al. (1990). C o h e r e n t m o t i o n was d e t e c t a b l e in a m o v i n g field o f r o t a t i n g lines when the o r i e n t a t i o n c h a n g e o f the line elements d u r i n g the traverse o f the s p a n was less t h a n a b o u t 30 deg. T h u s the bilocal d e t e c t o r s a p p e a r to c o r r e l a t e c o n t o u r s m o v i n g over their s u b u n i t s only if they are well m a t c h e d with respect to orientation. T h e p r o p o s e d scheme for p l a i d m o t i o n d e t e c t o r s implies a r e l a t i o n between the preferred speed o f the p l a i d m o t i o n d e t e c t o r a n d
<
s
>
® ,oo ,
,
TO
©
e i i
I
T-60
~
:e
E
i
Fig. 6. A hypothetical scheme for a plaid motion detector. The plaid motion detector shown signals the probability that rightward pattern motion has occurred. Plaid motion detectors tuned to same speed but other motion directions are simply rotated versions of this scheme. Plaid motion detectors tuned to the same direction but other speeds differ with respect to the span (S) and the delay (30). Plaid motion is detected by a weighted sum (weights: w~) of orientation selective bilocal detectors (or component motion dectors). The subunits of the component motion detectors in the set are thought to overlap completely but are drawn beneath each other for clarity. Thus the plaid motion detector is bilocal like the component motion detectors. A local competition (LC) occurs between the output of the subunits prior to the correlation stage. Consequently, motion is signalled only when the local orientation of the stimulus is similar at the two subunit locations. Also, for a moving line the response of the detector is then determined by the component motion detector with the best matching orientation preference. The preferred speed of each component motion detector in the direction perpendicular to its preferred orientation is equal. Thus, the preferred line speed in the direction parallel to the axis of the component detector increases when the line-orientation deviates more from perpendicular to this axis. As the spans of the component motion detectors are identical this can only be accomplished by a decrease of the component motion detector's delay when its preferred orientation deviates from perpendicular to the detector's axis: % = ~0*cos(~0).
1050
A. V, VAN DEN BERG and W. A. VAN DE GRIND
the orientation of the components with respect to the plaid motion. If plaids are built from components with identical speed the plaid motion is always directed along the bisectrix of the component motions. When the angle between the component motions is increased the same bundle is stimulated because all elements of the bundle are tuned to the same component speed. However, the plaid motion increases as a result of the increase in the angle between the components. Hence, if the scheme is correct the preferred speed of plaid motion selective cells should increase when the angle between the components which constitute the plaid increases. To our knowledge this relation has not yet been investigated. Another implication of the proposed scheme for the relation between component motion detectors and plaid motion detectors would be that bar patterns moving perpendicular to their orientation would be masked most effectively by (e.g. preceding) plaid motion when the masking plaid moves faster than the test stimulus. This conclusion hinges on the assumption that adaptation by plaid motion is related to the level of activity induced by the masking plaid at the combination stage. As the combination stage combines component motion detectors tuned to the same orthogonal speed, the component motions of the masking plaid must be identical to the motion of the test stimulus for optimal masking. Indications for such a relation were found by Ferrera and Wilson (1987) (their Fig. 4) but definite conclusions in this respect seem premature. Finally, it seems relevant to compare our data to those of Welch (1989). She investigated motion discrimination for moving plaids and for single gratings. The Weber-fraction for motion discrimination for both types of stimuli appeared to coincide when plotted as a function of the component motion but not as a function of the pattern motion. She concluded that motion discrimination is limited by noise in the component motion stage rather than in the combination stage. To relate these data to the scheme presented above we must speculate somewhat on the relation between motion detection and motion discrimination. It seems appropriate to relate motion discrimination to differential activation thresholds of motion detectors (McKee, Silverman & Nakayama, 1986) instead of to purely temporal or spatial mechanisms. What are the relevant properties of the bilocal detector in this respect? We have
argued elsewhere (van den Berg & van de Grind, 1989) that the activity of a bilocal detector can be equivalent to the overlap relation of its subfields after displacement of the delayed subfield over the distance traversed by the stimulus during the delay. This suggests that the motion discrimination threshold may be related to the just noticeable difference in the overlap relation. The overlap is maximal, obviously, when the stimulus moves at the tuning velocity of the detector. Without detailed knowledge of the subfield structure quantitative analysis of the connection between motion discrimination and the overlap relation is not possible. Nevertheless, the "travelling subfield" idea suggests that speed discrimination is better for detectors with smaller subfields, van de Grind et al. (1986) have shown that the span and the subfield diameter are proportional and increase with an increase of the tuning velocity. In the scheme proposed above all component motion detectors which belong to one bundle have identical spans. If motion discrimination is based on the output of the bundle (the pattern motion detector) it does not matter which component in the bundle contributes most because the just discriminable change in the overlap relation is identical for all components in the bundle. Thus, if the bundle activity determines the discrimination threshold the Weber-fraction must be identical for stimulation of the bundle with a single grating or with a plaid. As argued above the preferred pattern speed of the bundle depends on the pattern and is higher for plaids than for a single grating. Hence, the same Weber-fraction for speed discrimination is found for plaids and single gratings but shifted to a higher speed for the plaid because the bundle which determines the discrimination threshold is tuned to higher speed in the case of a moving plaid. This is exactly the result described by Welch (1989). There may be a problem here. The bilocal correlation model for motion detection implies a kind of channel theory for motion perception. The proposed grouping of component motion detectors to provide plaid motion selectivity results in a plaid channel which is tuned to a higher velocity if a moving plaid is shown than if a single grating is shown This would suggest that speed matching of dissimilar patterns (e.g. a referencegrating and a test-plaid) would be biased; at the matching speed the plaid's component motion would equal that of the reference grating. Hence, the pattern speed of the plaid
Conditions for the detection of coherent motion w o u l d be u n d e r e s t i m a t e d . I n d e e d , i n d i c a t i o n s f o r s u c h a bias w e r e g i v e n b y F e r r e r a a n d W i l s o n (1989). To conclude, component and plaid motion a p p e a r to be d e t e c t e d t h r o u g h b i l o c a l m e c h a n isms. T h e s m a l l e r d e l a y f o u n d f o r m o t i o n o f a p l a i d c o m p a r e d to m o t i o n o f single line p a t t e r n s s u g g e s t s t h a t p l a i d m o t i o n d e t e c t o r s are c o m p o s e d o f g r o u p s o f b i l o c a l d e t e c t o r s t u n e d to t h e s a m e s p e e d p e r p e n d i c u l a r to t h e i r p r e f e r r e d orientation.
REFERENCES
Adelson, E. H. & Movshon, J. A. (1982). Phenomenal coherence of moving visual patterns. Nature, London 300, 523-525. Albright, T. D. (1984). Direction and orientation selectivity of neurons in visual area MT of the macaque. Journal Neurophysiology 52, 1106-1130. van den Berg, A. V. & van de Grind, W. A. (1989). Reaction times to motion onset and motion detection thresholds reflect the properties of bilocal motion detectors. Vision Research, 29, 1261-1266. van den Berg, A. V., van de Grind, W. A. & van Doom, A. (1990). Bilocal motion detectors are selective for orientation. Journal of the Optical Society of America, A 7, 933-939. Dawson, M. & DiLollo, V. (1990). Effects of adapting luminance and stimulus contrast on the temporal and spatial limits of short range motion. Vision Research, 30, 415-429. van Doom, A. J. & Koenderink, J. J. (1982). Temporal properties of the visual detectability of moving spatial white noise. Experimental Brain Research, 45, 179-188. van Doom, A. J. & Koenderink, J. J. (1982). Spatial properties of the visual detectability of moving spatial white noise. Experimental Brain Research, 45, 189-195. van Doom, A. J. & Koenderink, J. J. (1984). Spatiotemporal integration in the detection of cherent motion. Vision Research, 24, 47-53. Fennema, C. L. & Thompson, W. B. (1979). Velocity determination in scenes containing several moving objects. Computer Graphics and Image Processing, 9, 301-315. Ferrera, V. P. & Wilson, H. R. (1987). Direction specific masking and the analysis of motion into dimensions. Vision Research, 27, 1783-1796.
1051
Ferrera, V. P. & Wilson, H. R. (1989). Perceived speed of moving 2D patterns. Investigative Ophthalmology and Visual Science (Suppl.), 30, 75. van de Grind, W. A., Koenderink, J. J. & van Doom, J. A. (1986). The distribution of human motion detector properties in the monocular visual field. Vision Research, 26, 797-810. Hildreth, E. C. (1984). The computation of the velocity field. Proceedings of the Royal Society B, 221, 189-220. Koenderink, J. J. & Van Doom, A. J. (1987). Representation of local geometry in the visual system. Biological Cybernetics, 57, 367-375. McKee, S. P. & Welch, L. (1985). Sequential recruitment in the discrimination of velocity. Journal of the Optical Society of America, A2, 243-251. McKee, S. P., Silverman, G. H. & Nakayama, K. (1986). Precise velocity discrimination despite random variations in temporal frequency and contrast. Vision Research, 26,
609-619. Movshon, J. A., Adelson, E. H., Gizzi, M. S. & Newsome, W. T. (1985). The analysis of moving visual patterns. In Gatass, C. C. & Gross, C. G. (Eds.), Study group on pattern recognition mechanisms (pp. 117-151). Vatican City: Pontificiae Academiae Scientiarum Scripta Varia 54. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1988) In Numerical recipes. Cambridge: Cambridge Univ. Press. Reichardt, W. (1961). Autocorrelation, a principle for the evaluation of sensory information by the central nervous system. In Rosenblith, W. A. (Ed.), Sensory communication. New York: Wiley. Reichardt, W. & Egelhaaf, M. (1988) Movement detectors provide sufficient information for local computation of the 2D-velocity field. Naturwissenschaften, 75, 313-315. Rodman, H. R. & Albright~ T. D. (1987). Coding of visual stimulus velocity in area MT of the macaque. Vision Research, 27, 2035-2048. Rodman, H. R. & Albright, T. D. (1989). Single-unit analysis of pattern motion selective properties in the middle temporal visual area MT. Experimental Brain Research, 75, 53-64. van Santen, J. P. H. & Sperling, H. G. (1984). Temporal covariance model of human motion perception. Journal of the Optical Society of America, A1, 451-473. Ullman, S. (1983). The measurement of visual motion. Trends in Neuroscience, 6, 177-179. Uras, S., Girosi, F., Verri, A. & Torre, V. (1989). A computational approach to motion perception. Biological Cybernetics, 60, 79-87. Welsh, L. (1989). The perception of moving plaids reveals two motion-processing stages. Nature, London 337, 734-736.