Motion analysis with orientational filtering

Motion analysis with orientational filtering

251 rNFU~~A TION SCIENCES 62,25 l-269 ( 1992) Motion Analysis with Orientational Filtering GEORGE MBGHABGHAB ~e~rtment of ~uthemutics andcomputer S...

901KB Sizes 0 Downloads 53 Views

251

rNFU~~A TION SCIENCES 62,25 l-269 ( 1992)

Motion Analysis with Orientational Filtering GEORGE MBGHABGHAB

~e~rtment of ~uthemutics andcomputer Science, Vuidosta State CoIiege$ Valdosta, Georgia 31698

ABRAHAM KANDEL

~e~~~trne~t of Computer Science and Engineering, Uni~e~~it~ of South Florida, Tampa, Florida 33620

ABSTRACT The primate visual system recognizes the true direction of pattern motion using local detectors only capable of detecting the component of motion perpendicular to the orientation of the moving edge. A multilayered model is presented with input patterns (binary images) each consisting of rigid geometrical forms moving in a particular direction. Input layers are given component orientation and frequency similar to those recorded in visual area Vl, which projects to area MT. The in~mction between two consecutive layers of the muhilayered model seem to play an important role in solvingthe apertureproblem.

1. INTRODUCTION Moving visual stimuli are processed in several layers in the primate visual system. The first cortical layer is layer 4-Co of area Vl, which receives its main ascending input from the magnocellular layers of the LGN. Layer 4-Car projects to layer 4B, which contains many highly tuned, direction selective neurons [ 11. The neurons belonging to these layers respond to moving contours as if these contours were moving perpendicular to their local orientation. In a recent article, Nakayama [2] reviewed major motion analysis models and classified them as linear and nonlinear models. These biological models can fit also into a different classification scheme: (1) Motion una~s~ with orientational f ~ In this case the image is passed through a set of nono~en~ bandpass filters. The output of these OBLsevier Science Publishing Co., Inc. 1992 655 Avenue of the Americas, New York, NY 10010

0020-0255/92/$05,00

252

GEORGEMEGHABGHABANDABRAHAMKANDEL

filters is sent to a motion analysis system, which might track features or might perform a cross-correlation between successive views (Reichardt [3], Van Santen [4]). A cross-correlator will unambiguously assign a single motion to the whole pattern if the components of the moving objects were of similar speed (or frequency for a given unit of length). However, if the two gratings are of different speeds (or frequencies for a given unit of length), they will not pass the same band, and the results of the cross-correlation will not produce a maximum. The familiar problem of motion aperture has to be solved in order to find the direction of motion. (2) Motion analysis without orientational filtering: The image is first passed through orientational filters similar to those encountered at the cortical level (Mat-r and Ullman [5], Poggio [6], Watson and Ahumada [7], Adelson and Movshon [S]). The output of these mechanisms will provide information only about the motion normal to their own orientation. But in this case, the motion ambiguity problem arises. There are several ways of solving this problem. Hildreth [9] suggested a minimization process if the motion is smooth. However, the solution suggests the intersection of constraints scheme (see Gizzi [lo]). Psychophysics and physiology suggest the prevalence of oriented filtering in the early stages of the primate visual system (see [ 1 1 - 131). This paper will investigate the consequences of a multilayered pyramidal model for visual motion and how the interaction between consecutive layers might play an important role in learning to solve the aperture problem.

2. HIERARCHICAL ANALYSIS AND ITS IMPLICATIONS

OF VISUAL

MOTION

There have been no studies, in the case of motion analysis with orientational filtering, that investigated a multilayered hierarchical model for the analysis of the motion of objects. In a recent study Meghabghab [14] proposed a pyramidal model for motion analysis that uses the Walsh transform as the two dimensional transform that realizes the conservation of information projected from the retina to area 17. The simple cells located in area 17 are supposed to be responsible for the orientation selectivity of the cortical cells. As specified in [14], the support for using the Walsh transform stems from the following points: (1) Propagation of information corresponding to the Walsh transform would mean the distribution of each bit of information over all visual pathways. In this case, by the loss of one storage location, all information would be

MOTION ANALYSIS WITH ORIENTATIONAL FILTERING

253

degraded, but no information would be completely destroyed. In theory, such a storage system would remain operable as long as more than half the visual pathways were operating. This corroborates the idea of locality suggested in

WI. (2) Neurobiological data [2] yield the observation that simple systems of functions participate in the analysis and processing of discrete images from the retina to area 17. This corresponds exactly to the case of the Walsh transform, where very simple systems of functions are needed to generate Walsh functions. This in turn corresponds to the simplicity hypothesis for the network

WI. The main characteristic of the Walsh transform is the sequence concept it embodies. The sequence concept is concerned with whether there is a change in the frequency of the pattern:

(2) For f, the change is from + 1 to - 1, while for f, it is from - 1 to + 1. However, the templates are applied with different sizes on the images, so that the sequence property in the information contained in the components of the image of the transform must be scaled. The templates are applied to images with different resolution levels, analogous to the different definitions of the neurons encountered through the visual pathways. Because of this characteristic, the Walsh transform is useful for elaborating size changes at all layers of the pyramid. At each level of the pyramid a specific number of operators extract local features from the image. These operators correspond to the neurons of the given order of level, and to the rows of the Walsh transform at a given level of the pyramid. Let W, denote the Walsh transform at level N of the pyramid. W, is an 8 x 8 matrix at level 1 of the pyramid. Thus, eight different operators can be used at this level. These operators (also called sequencies) have different configurations (see Figures 1 to 5) and consequently extract different features from the object. Some of the sequencies detect edges, some lines, and some contours. The sequencies at higher level detect textures and blobs of images (see Figures 6 to 10). Following Deugman [ 161, the observations of the spatial frequency and orientation selectivity have to be considered in relation to each other. A rather

GEORGE MEGHABGHAB AND ABRAHAM KANDEL

254

-1

c

-1

1 1

1

-1 -1

Fig. 1.

&q(‘)(2). Sequency at level 1 of the pyramid.

coarse spatial frequency selectivity can be formed at early stages of visual processing, namely at the level of the retinal ganglion cells. But in general orientation is considered a tuning variable when it reaches the cortical level, where the simple cells fire when a line or edge corresponding to the particular orientation of the neuron enters that area of the visual field. In an actual cortex, all possible orientations are represented. In addition, the transition from one orientation to the next is gradual. A 2D Fourier analysis of cortical receptive field captures their spatial frequency as well as their orientation properties. According to Campbell et al. [ 171, each receptive field should show a single peak in the Fourier space, and the frequencies that are occupied by the RE are all different. A pyramidal Fourier analysis of the sequencies is investigated

1 -1

I

-1 1

-1

1 1

-1 1

-1

-1

1

-1

-1

-1

1 -1

1 -1

1

1

1 -1

I

1

-1

1

-1

-1 1

-1

-1

1

CD

1

1

-1

-1

1 -1

-1

-1

1

-1

1

1

-1 1

1

Fig. 2. Seq(*)(19) = &q(‘)(2) Seq (‘$3). Sequency at level 2 of the pyramid.

255

MOTION ANALYSIS WITH ORIENTATIONAL FILTERING 1

-1 -1 1 1 1 -1 1 -1

-1 1

1

-1 -1

1 1 1 -1 -1

-1 -1 1

I

-1

1

-1

1

-1

1

-1

1

1 1 1 1

-1

-1 -1 1 1

-1 -1

1 1 1

-1 -1 -1 -1 -1 -1 -1 1 1

-I -1 -1 -1 -1 1 1

1 1 1 1 1 -1 -1 -1 -1

1 1 1 1 1 -1 -1 -1 --I -1 -1 -1 1 1 1 1 1 1 1

-1 -1 1 1 1 1 1 1

. L -1 -1 -1 -1 -1 -1 -1 1 1

-1

-1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1

1 1

-1 -1 1

1

1

1

1

-1

1

-1

1

-1

1

-1 -1 -1 -1 1 1 1 1 1

1 -1

Fig. 3.

Se@)@l)

= Se@)(l)

-1 -1 1 1 1

1

-1

1

-1

-1

-1

-1

-1

-1

-1

1

-1

1

-1

1

-1

1

-1

1 1

-1 -1 -1

1 1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 -1 -1

Se#)(2)

Se&l).

1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1

-1 -1 -1 -1 -1

1 1 -1 -1

lA1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 -1 -1 -1 -1 -1

1 1

-1 -1

1 1

-1

1 1 1

1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1

--1 -1 -1 -1 -1 1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 -1

1 1 1

-1 -1

1 1 -1 -1 -1 -1 -1 -1 -1

1 1 1 1 1 -1 -1

-1 -1 1 1 1 1 1

-1 -1 -1

1 1

-1 -1

-1 -1

1 Sequency at level 3 of the pyramid.

here in order to verify Campbell’s bypothesis. At the first level of the pyramid, one can count up to six different frequencies and thirteen different orientations, which represent an orientation every 15 degrees. At the second level of the pyramid, one can count fourteen spatial frequencies and approximately seventy orientations. Thus, a Fourier analysis of the sequencies at all levels of the pyramid allows a regular tessellation of the frequency-orientation domain.

3. APPLICATION OF THE SEQUENCIES TO THE DETECTION OF MOTION In this section, the sequencies found at each level of the pyramid will be applied for the detection of movement of objects. The dual space (frequency,

GEORGEMEGHABGHABANDABRAHAMKANDEL

256

1

1 -1

1

-1 -1

-1 -1

-1 0

0 1

-1

-1

1

0

1

-1 0

-1 1

1 0

0

0

1

1 0

-1

0

-1

1 0

-1 1

1

1 1

0

-1

1 10 0 0 -1 1 0 -1 1 -1 0 -1 1 1 -1-l 1 0 0 0 1 1 -11 0 l-10 1 1 1 -1 0 -1 1 -1 1 -1 1 -1 0 -1 1 1 -1 1 E&F-I 0 0 1 -1 1 -1 -1 1 0 1 -1 1 -1 -1 1 -1 1 0 1 -1 -1 1 -1 0 1 -1 0 -1 1 -1 1 1 -1 -1 0 0 0 -1 I 1 -1 1 -1 0 -1 1 0 1 -1 1 0 -1 1 0 1 -1 0 0 0 -11 0 0 0 l-1-11 -1 1 0 -1 1 -1 0 -1 1 -1 0 1 -1 1 -1-11 0 0 0 l-l 0 0 0 -1 1 0 1 -1 0 1 -1 1 0 1 -1 0 -1 1 -1 1 1 -1 0 0 0 -1 -1 1 1 -1 1 -1 0 -1 1 0 -1 I -1 -1 1 0 1 -1 1 -1 -1 1 -1 1 0 1 -1 -1 1 -1 1 0 0 -1 1 -1 1 1 -1 0 -1 1 -1 1 1 -1 1 -1 0 -1 1 1 10 -1 10 1-l 1 1 0 0 0 1 -1-l 1 1 -1 0 -1 1 -1 0 1 -1 0 0 0 l-l 0 0 0 1 -1 0 1 -1 1 0 I -1 -1 1 1 -1 0 0 0 -1 1 -1 0 -1 1 0 -1 0 1 -1 1 -1 -1 1 1 -1 -1 -1 1 0 -1 0 -1 1 -1 1 1 1 Fig. 4. Seq’3’(101) = Se&l) Se@)(4) &q(‘)(6). Sequency at level 3 of the pyramid. 0

0

-1 -1

-1

-1

-1

orientation) detailed in [14] allowed the study of the receptive fields of the neurons of area 17. The main function of visual perception is to acquire a maximum of information in a minimum amount of time. The eye does not apprehend the content of an image in one single pass. It does not operate, like most artificial sensors, in a continuous scan, but by a number of focuses separated by saccades, which quickly change the position of focus. For each focus it is normal to distinguish neurophysiologically two zones in the visual field: (1) A central zone, corresponding to the fovea or zone of analysis. (2) A peripheral zone of integration and detection, where the appearance a pertinent detail triggers a reflex of focusing on the detail. In reality,

according

to the data observed

by Hughes

of

[18], there is no sharp

MOTION ANALYSIS

WITH ORIENTATIONAL

FILTERING

257

1

-1

1 -1

1 0

-1

1

-1 0

. -1

A

-1 1 -1

0

1

-1

0

-1

0

-1

1

0

-1 0

0 1 1

0

0 -1 0 1 1 1 -1

-1 -1 1 0 1 1 0 0 0 0 -1 1 -1

1

0 -1

0 -1 l-l 0 1 -1 1 -1 1 -1 0 l-11 10 10 1 0 0 1 -1 1 -1 -1 0 -1 -1 -1 -1 1-10 0 1 -1 l-l 1 1 1 0 1 1 -1

Fig.5. SeqC3)(102)= Seq(“(l)Seq(‘)(4)

0 1

0

1

0

-1 1

0

0

0

1 0

-1

0

-1 1

1 1

1 -1 1 -1 -1-11 0 1-10 -11 0 0 -11-I 0 1 -1 -1 0 0 l-10 0 0 1 1 -1 0 -1 1 0 0 0 1 0 1 -1 0 1 -1 1 -1 I 1 -1 1 1 -1 1 -1 1 0 1 -1 1 0 1 -1 1 -11 -1 l-11 -1 0 -1 1 0 0 0 -1 0 -1 1 0 0 0 -1 -1 0 -1 -1 1 -1 1 1 -1 0 0 1 -1 1 -1 1 1 Seq”‘(7). Sequency at level 3 of the 1

-1

-1 1

1

+A

-1

-1 0

1

10 -1

-1

-1 1

0

0 -1

-1 1

1 10

-1

-1 0

1 -1

1 -1

1 -1

1

0

-1

1

1 -1

0

-1

1 11-l 1 0 1 1 -1 1 -1 0 0 -1 0 -1 -1

0 -11-l -1 0 1 0 1 -1 1 1 -1 1 -1 1 0 -1 1 10 10 -1 1 0 0 -11 -1 1 -1 0 -1 -1

-1

-1

0

-1

1

0

-1

1

-1

-1

-1 1

1

0

-1

-1

Fig. 6.

Two dimensional Fourier transform of Figure 1.

Fig. 7.

Two dimensional Fourier transform of Figure 2.

1 1 -1

1 0

-1

1 -1

-1 -1 0 -1 1

pyramid.

GEORGE MEGHABGHAB AND ABRAHAM KANDEL

Fig. 8.

Two dimensional Fourier transform of Figure 3.

boundary between the two zones. The curve of acuity decreases uniformly with eccentricity. 3.1.

STUDY OF THE MOTION

OF AN OBJECT: AN EXAMPLE

Table 1 describes the frequencies and orientations of the sequencies at level 2 of the pyramid. In Table 1, it can be seen that some sequencies have the same response, but differ in phase. They match the same point of the visual field. The set of sequencies at level 2 of the pyramid can be divided into two sets: (1) A set that has only one configuration for a given frequency and orientation. (2) A set that has two configurations for a given frequency and orientation. If a specific spatial frequency is chosen (5/2A), ten coupled filters respond to this frequency. As mentioned in Section 1, if two components of the same object do not have the same spatial frequency, they do not fuse to produce a homogenous motion percept. Thus, in order to recover the orthogonal direction of motion, a spatial frequency is chosen and the orientation is the only variable left to be determined. For example, consider two sequencies and their respective responses (oi , 0,) to a give stimulus. We can determine a response o, for the couple of sequencies independent of their phase by considering the quadratic mean of the responses o, and 02:

3.2.

A SIMPLIFIED

VERSION OF THE PERCEPTION

OF MOTION

In order to ,analyze the movements of objects, a simple procedure was adopted to simplify significantly the perception of movements. To take the

Fig. 9.

Two dimensional Fourier transform of Figure 4.

MUTIUN ANALYSIS WRH ~~~A~~~AL

Fig. 10.

FILTERING

259

Two dimeusional Fourier transform of Figure 5.

ocufometric activity into account, the results of three successive focuses, considered in a random fashion in the plane of the object analyzed, are integrated. Although Blackburn et al. [19, 201, in their model of motion perception, take into consideration the reflex that switches the fixation from the periphery to the fovea and vice versa, they do not consider how this translates into the actual m~surement of motion. Thus, in our case, three successive focuses are integrated into the act& response for the perception of motion. The results of each focus are obtained from the responses of the set of sequencies of order 2 located in the mul~ayer~ model that consists only of two layers L, and L, that are overlapping. Figures 11 and 12 represent the two layers L, and L, respectively. Layer L,, the smaller layer, has seven hexagonal fields of seven pixels each. It corresponds to the fovea. Layer &, has seven hexagonal fields of 7 x 7 pixels each. It corresponds to the periphery. The bigger layer, L,, has fields whose elements integrate the activity of seven pixels. It represents the integration effect realized by cells of the

TABLE 1 Frequency and Orieutation of Some Sequencies at Level 2 of the Pyramid Frequency SC

z$g; E$:;

Sq’(O) seq’(0)

Se&41 Se&O) ses’(J)se4’~0) I’m’ seq’(2) Se&41 Seq’O) se4’0) seq1(5)seq’~3) W’(4) Se&f) seq’(O%&3) Se¶‘fO)seqt~7~ se4’(4w&7r seq’(SI Se&71 Se&) Se&I

fo 0.333 0.666 1 0.144

0.219 0.263 0.373 0.405 0.512 0.622 0.731 0.776 0.885 0.985 1.02s

Orientation

4x? 360 75.99 42.07 30.21 360 74.53 43.37 35.56 27.71 25.57 19.81 15.83 14.72 13.52 11.23 9.83 9.98

mmmmm mm

f. s: I

MOTION ANALYSIS WITH OPERATIONAL

FILTE~NG

261

periphery of the retina. The extreme simplification of the model does not allow us to have an apparent continuity between layers. The responses of each one of these filters associated with its field are considered. The response of the multilayered model is then the result of the responses of both layers L, and L,. However, because the fields of layer L, have an area 7 times the area of layer L,, there exists between the two layers seven pairs of filters that have the same angle-frequency response, but with an angle of rotation of - 19”lO’. This allows the model to respond to these seven filters with better precision at layer L, than at layer L,. This interaction among layers must have an important role in learning how to solve the aperture problem. Although Sereno [21] shows that two layers in a neural network are enough to learn the solution to the aperture problem, he does not specify the interaction among elements at a given layer. Although he does specify that the projection from a unit in the first layer to a unit in the second layer falls off as a gaussian centered on the retinotopically equivalent point on the second layer, he does not specify the effect of different projections on the interactions of responses between layers. Thus, for each point of focus, our model gives a certain number of responses characterizing orientation and frequency, and most importantly delivers the direction of motion of the perpendicular component. Thus to analyze the motion of an object, a set of operators approximating the visual receptive fields is applied, taking into account the oculometric activity, to the abovementioned object, and a vector of (2 x 7) components is obtained. Except for the calculation of the quadratic mean in Equation (3), these operators necessitate a moderate amount of computation, given that they are based on the Walsh transform. They possess an advantage over the multichannel model with several frequency filters in that they do not use any convolution by Fourier transformation.

OF THE ANALYSIS 3.3. APPLICATION DIFFERENT OBJECm

OF MOTION

TO THREE

The objects studied correspond to well-known geometric forms: a rectangle, a triangle, and a polygon (Figures 11 to 16) seen in the layers L, and L,. The same objects were subject to movement in both layers (Figures 17 to 22). Table 2 shows the responses of twelve different sequencies on the objects. Given that the objects considered were binary images, there is no problem with high frequencies as far as the fixation point is concerned. This would be a problem if gray level moving objects were considered. The results in Table 2 show the detection of the different components of the objects. As mentioned in the introduction, a disambiguation stage is needed at this level to combine the

GEORGE

262

MEGHABGHAB

AND ABRAHAM

KANDEL

0 1 0 0 0 0

0 0 0

1 1 1 1 0

1 1 1 I 0

0 1 1 1 0 0

0 0

Fig. 13.

0 0

0

0 0 0

0

1 1 1

10 1 1

1 0 1 0100011001 0 0 0 0 0 0

0 1 1 1

0 0 0 0

0 0

0 0 0

0

1 1 0 0 0 1 1 1 1 1 0 0

0 0 0 0 0 1 0

0 0 1 1 1 1 0 0 0

1 1 0 0

0 0 1 0 0 0

0 0

0 0 0

0 0

0 0

0

TrianO2: a triangle at level 2 of the hierarchy.

1 0

0

0

1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0

1 1 1 1 0 0 0 0 0

0 0 0 1 1 0 0 0

0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

0 0 1 1 0 0 0 1 1 1

1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 1 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 1 0 0 0 1 1 1 1 1 0 0

0 0 0 0 0 0 0 1

0 0 0 1 1 1 1 0 0 0

1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0

1 1 1 1 0 0 0 0

0 0 0 0 1 1 0 0 0

0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

0 0 1

10 0 0 0 1 1

1 1 0 0 41 0 0

0 0 1 0 0 0 0 1 1 1 1 0 0 0

0

0

0 0

0 0 0 1 1 1 0 0 0

0 0 1 1 0 0 0

0 0 1 0 0 0 0

0 0 0

0 0

0 0

0 0

0 0

Fig. 14. TrianO3: a triangle at level 3 of the hierarchy. The same remarks as in Figure 12 apply here.

MOTION ANALYSIS WITH ORIENTATIONAL

FILTERING

263

0

0

0 0

0

10 110

1

11 1

1

0

1

0

1

1 1 1 0

Fig. 15.

1 1 1 1 0

0 10 1 1 1 0

1

0

1

0

0

0

0

0

0

1

a

0

0

0

PolygO2: a polygon considered at level 2 of the hierarchy.

0 0

0 0

1 0 0 0 0 0

0

1 1

1

1 1

1 1

1 1

0

1 0

1

1

1

1 1

1 0

1

1

1

1 0 0

1

0

1 1 1 1 0 0

0

0

0

1 1 1

1 1 1 1 1 0

Fig. 16 .

1 1 1 0 0 0 0 0 0 0 1 1

1 1 1 1 0

1 1 1 1 1 0 0 0 0 0 0 0

1 1 1 1 1 0

1 1 1 1 1 0 0 0 0 0 0 0 1 1

1

1 1

1 1 0 0 1 1 I 1 0 0 0 I2

1 1 0

0 8

1

0 0 0

0 0

0 1

1 1 1 0 0 1 I 1 1 1 0 0 0 0 0 0

0 0

0 0 0 1 1 1 I 1 1 0 0 0

0 0

0

0 0 0 0 0 1 1 1 1 1 0 0 1 1 1 1

0 0

0 0 a 0 1 1 1 1 0 0 1 1 1 1 1 0

Polyg03: a polygon considered

0

1 1 1 1 0 0 0 0 0 a 0 1 1 1 1 1 1

1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0

1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0

0

1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 a 0

0 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0

0 0 0 0

0 0

0

0 0

0 0 0 8 0

1 1 1 1 1 1

0 0

1 1 1 1

1

0 1 1 1 1 0

0 0 0 0

0 0

0 at level 3 of the hierarchy.

0 0 0 0

0 0

0

0 Q

0 0 0 0 0

264

GEORGEMEGHABGHABANDABRAHAMKANDEL

0 0

0

0

1

0 0

1

0

1

1 0

0

0

1

0 0

1

1

0

0

1

0

1

0

0

1

1

0 0

1 1 1

1

1

0 0

0 0

0

0

0

0

0

0

0

0

0

0

Fig. 17.

Rorec02: the translation of Recta02 in Figure 13.

0 0 0

0 0

1

0 0

1

0

1 0

0 0

1 0

0 0 0 0

0 0

0

1 1

0 0 0

1

0

0

1 1 1

0

0

1

1

0

1

0 CI

0

1

0

QCO-O-O 0

1 1

1

0

0

0

0

0

0

0

1

0

1

0

1

1

1

0

1

0

1

1

0

0

0 0

0 0 0 0

0

0

0

0 0

1 1 1

0

1 1 1

0

1 1

1

0 0

1

1

1 1

1 1 1

1

0

1

1

1

0 0

0

0

0

1 0

0

1

0 0

0 0

0

1 1

0

0

1 0

0

1 1

0 0 0 0

0

0-

1

0

0

0

0

1

1

0-0

0

0

0

0

0-0-0 0 0 1

0

0

0

0

0 0

0 0

0 0

0

0 0

0

1

0

0 0

0

0 0

0

1

0 0

0

0 0

1

0

0

0

1

0 0

0

0

1

0

0 0

0

0 0 0

0 0

0

0

0

0

0

1

0

0

0 0

0 0

1

1

1

0

1 1

1

0

0

1 1

1

0-01

1

0

0-10 0

1~

1

1

0

I

0 0

0-

0

0

1 0 0

1

1 1

0 1

1

1

1

0

1 1

1

0

0

1 0

0 0

0

0

0

0 0

0 0 0

1

0

1

1 1

0

0

0

1

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0 0

0

0

0 0

1

0

0

0 0

0

0

0

1

0 0

0

1

0

0 0

0

1

0

0

1 1

1

1 0

1

1

1

1 1

1 0

0

1

0

0

0

Fig. 18. RorecO3: the translation corresponding to RorecO2 but seen at level 3 of the hierarchy. Note the overlapping of information inI the middle and nonoverlapping at the periphery.

MOTION ANALYSIS WITH O~ENTATIONAL

FILTERING

265

0

0 0

0

1 0

B 0

1

1 1

0 0

1 1 1

0 1

0

B 1

a

0 0

0

0

1

1

1

0

a

1 1

0

B

0 0

0 0

8

a

0

0

0 1 0 0

8

0

Fig. 19.

Rotri02: translation of the triangle in Figure 13.

0

B

0 0 0

0

0 0 0

0

0

0

1

0

0 0

0

1

B

1

0

0

0

1

1

1

0

0 8

0

0

0 0

0

1 0

0 0 0

1 0

0

1 0 0

.

L

0

1

B

1 1

1 1 1

0

1 f 0 0

1 1

a

0 0

0 0

0 la

0

0

1

0

1

0 1

0 0

1 0

1

1 1 0

1 1 0

1

0

a

0

0 0 0 1

0

1

1 1

1 1

0 0 0

B

0 0

1

1 1

1 1

1

0 0 0 0

0 0 0

1 1 0 0

0 1

0 0

0

Fig. 20.

0 1

1

0 0

0 0

1

0 0

1

1

0

a

0

1 0

0

0 0

0

0 0

0

1 1

0

0

0

%

ia

0

0

Q

1

0

a

B

1 1

0

0

la 1

1

0

0 1 0

0 0

0 0

0

0

1 1

0 1

1

B-Q0 0 0 0 0 0 0 0 0

0

0

0

1

0

0

1

1

1 0

1 1

0

0

1 1

Q

0

a

B 0

1

A

0

0 1

.

El 0

1

A

0

0

1

0 0

1 0

0

0 0

0

0

1

%

0

0

1

0

0

0

.

0 0

1 0

0 0

0

0 0

0

0

0 0

0

%

B

8

0 0

8

0

0 8

0

1

0 0

0

0 0

0

0

0

0

1

0

0

1 1

.

0

0 %

1 I

c3

0 1

a

0

0

1

0

1

0

0 1 1

1

0

0

1

1

0 0

1

0

a

0

1

0

0

0

1 1

0 0

1

0

0 0

8

0 0

0 0

0 El

RotriO3: translation of the triangle in Figure I$, but seen at level 3

GEORGE

266

MEGHABGHAB

AND ABRAHAM

KANDEL

0

B 0

0

0

0 B

0 0

0 Q 0

0 0 0 0

Fig. 21.

Ropol02:

0 1 1

1 1 0 0

0

1 1

1 1 1 0

I

1

1

1

1

1

1

1

1 1

1 0

1

1 1

e 0

1 0

the polygon of Figure 15 translated.

0 0

0

0 0

0

0 0 0 0

0 0

a a 0

a

0 0

0

0

1

1

1

1

1

1

0 0

8

B

1

1

0

0

0 0

0 0

a

0

la 0

8

1

0 0 0 0 0

ra 0

0 0 0 0

1 1 1 0 0

la

0 1 1 1 1 1 0

1 1 1 1 0 0 0 0 0 0 0 1 1

1 1 1 1

1 1 I 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0

1 1 I 1 1 I 0 0

0

B

1 1 1

0 a 0 a 1

1 1 1 0

I 1

0 0

0 0 0

1-a 1 0 0 1

a

1

1

0 0 1 1

1 Q _ a 0 0 0 0

1

0 0 1 1

1

1

0

0 0 1 1 1

1 1

0-e-e-El

0 B 0 0 0

0 0 0

1-1-1-0 I 1 1

1

0

0

0 0 0 0

a

0 1

1 1 1 0

0 0 0 1

1 1

0 0 1 1 1 1 1

0 1 1 1 1 0 0 0

0

1 1 1

1 1 0 0

1 1 1 1 1 1 0

0

1 1

1 1 1 0 0

0 1 0 la 1 O-O-l-lmO 0 1 0

0

1 1 1 1 1 1

0 0 1 1 1 1 1

0

0 1 1

1

1 0 0 0 0

1 0

0

e-0 0

Fig. 22.

Ropol03:

the polygon of Figure 16 translated.

1 1 1

0 0

1 0 0 1

1

1 1 0

0 1 1

1 1 1 1

0 1 1

1 1 0

1

1 1 1 B

0

267

MOTION ANALYSIS WITH ORIENTATIONAL FILTERING TABLE 2 Values of the Different Sequencies for the Three Objects

operators

Rectangle

Triangle

Polygon

7.4

7.28

7.32

0.72

0.69

0.63

1.15

1.12

wtwed(o) wmseq’(o) seq’(4) se4’(7) W(6) seq’(4) W’(7) W’(6) W’(6) Seq’(O) seq’(4)w’(Q Wl’O) se&Oh seq’(3)seq’(7)) c%‘(mq’(Oh seq’Wseq’(7)) Wmseq’w. W2)seq’(7)) (seq1(7)seq’(3),seq’(7)seq’(7)) wq’(S) seq’m, se&) seq’(4)) w?(4) seq’(2h M’(4) W(4)) M’(5) Seq’(Oh Seq’ (5) Q&6)) ~seq’~2~seq’~O~o),seq’~2~seq’~4~~ (seq’(6)Seq’(f),seq’(6)seq1(6)) W’(l)Seq’(O), Seq’(OWd6N

different components and assign a direction of motion to the whole pattern using the intersection of constraints model [lo]. 4.

DISCUSSION

AND FURTHER

CONSIDERA

TIONS

The hierarchical model suggests that it may be possible to study motion using only structured motion fields and biologically realistic operators at each level of the structure. The model needs to be applied to complex moving objects in the input: rotation, dilation, shear, multiple objects, flexible objects. Thus biological estimates of rotation and dilation are made in two stages-rotation and dilation are not detected locally, but perhaps built from estimates of local translation. The more higher layers are added to the model, the more insight into rotation, dilation, and segmentation of moving objects can gained. Making a biologically realistic model of visual motion is difficult, since it requires a biologically realistic model of complex stations in the visual system -retina, LGN, and primate visual cortex. We did take into consideration all the responses from all these visual stations in order to analyze motion. However, in order to improve any visual model, more cooperation between physiologists and modelers is needed in order to generate more useful libraries of response profiles to arbitrary stimuli. Following Sereno [21], multilayered feedforward models suggest that learning the solution to the aperture problem

268

GEORGE MEGHABGHAB AND ABRAHAM KANDEL

is possible in a two layered network provided we achieve a better understanding of the set of stimuli of the visual system. The very large set of stimuli on which the real visual system is trained (hundreds of millions of views) is still poorly characterized. It is believed that adding more layers to such networks would improve learning of how complex moving objects, as mentioned earlier, are processed in the visual system.

The authors thank Dr. R. Lamprey and Dr. S. Goei for comments and suggestions.

REFERENCES 1. J. A. Movshon, E. H. Adelson, M. S. Gizzi, and W. T. Newsome, Analysis of moving visual patterns, in Pattern Recognition Mechanisms, Springer-Verlag, 1985, pp. 117-151. 2. K. Nakayama, Biological motion processing: A review, Vision Res. 25(5):625-660 (1985). 3. W. Reichardt, Auto-correlation, a principle for the evaluation of the sensory information by the central nervous system, in Sensory Communications, Wiley, 1961, pp. 307-311. 4. J. P. H. Van Santen and G. Sperling, Temporal covariance model of human motion perception, J. Opt. Sot. Amer. A 1:451-473 (1984). 5. D. Marr and S. Ulhnan, Directional selectivity and its use in early visual processing, Proc. Roy. Sot. London Ser. B 211:151-180 (1981). 6. G. F. Poggio, Visual algorithms, in Physical and Biological Processing of Images (0. J. Braddick and A. C. Sleigh, Eds.), Springer-Verlag, 1983. 7. A. B. Watson and A. J. Ahumada, Model of human visual motion sensing, J. Opt. Sot. Amer. 2(2):322-341 (1985). 8. E. H. Adelson and J. A. Movshon, The perception of coherent motion in two dimensional patterns, presented at ACM Workshop on Motion: Perception and Representation, 4-6 Apr. 1983. 9. E. C. Hildreth, The Measurement of Visual Motion, MIT Press, 1984. 10. M. Gizzi, The Processing of Visual Motion in Cat and Monkey Central Nervous System, Ph.D. Dissertation, New York Univ., 1983. 11. Toyama, K., M. Kimura, and K. Tonaka, Cross correlation analysis of intemeuronal connectivity in cat visual cortex, J. Neurophysiol. 46:191-200 (1981). 12. H. Wassle, L. Reichl, and B. B. Boycott, Morphology and topography of ON and OFF Q cells in the cat’s retina, Proc. Roy. Sot. London, Ser. B:157-175 (1981). 13. H. Wassle, B. B. Boycott, and R. B. Bling, Morphology and mosaic of ON and OFF fl cells in the cat’s retina and some functional considerations, Proc. Roy. Sot. London Ser. B 212:177-195 (1981). 14. G. V. Meghabghab, Hierarchical Analysis of Visual Motion, Ph.D. dissertation, Florida State Univ., 1988. 15. S. Ullman, The Interpretation of Visual Motion, MIT Press, 1979. 16. J. G. Daugman, Two dimensional spectral analysis of cortical receptive field profiles, Vision Res. 20847-856 (1980). 17. F. W. CampbelJ, B. Cleland, and C. Enroth-Cugell, The angular selectivity of visual cortex cells, J. Physiol. 198:237-250 (1%8).

MOTION ANALYSIS WITH ORIENTATIONAL

FILTERING

269

18. A. Hughes, Population magnitudes and distribution of the major modal classes of cat retinal ganglion cells as estimated from HRP filling and a systematic survey of the soma diameter spectra for classical neurons, J. Comparative Neuro.. 197:303-339 (1981). 19. M.R. Blackbum, H. G. Nguyen, and P. K. Kaomea, Machine visual motion detection modeled on vertebrate retina (preserted at Underwater Imaging Conference), SPIE Proc. 98090-98 (1988). 20. M. R. Blackbum and H. G. Nguyen, Biological model of vision for an artificial system that learns to perceive its environment, in ZJCNN, Washington, 1989, Vol. II, pp. 219-226. 21. M. I. Sereno, Learning the solution to the aperture problem for pattern motion with a Hebb rule, in Advances in Neural Information Processing Systems, Morgan Kaufmann, 1989, pp. 468-476. Received 10 January 1990; revised 14 February 1990