Spatial visual channels in the fourier plane

Spatial visual channels in the fourier plane

Vision Res. Vol. 24, No. 9. pp. 891-910. 1984 Printed in Great Britain. All rights reserved SPATIAL Copyrrghr VISUAL CHANNELS IN THE FOURIER c O...

3MB Sizes 28 Downloads 68 Views

Vision Res. Vol. 24, No. 9. pp. 891-910. 1984 Printed in Great Britain. All rights reserved

SPATIAL

Copyrrghr

VISUAL CHANNELS

IN THE FOURIER

c

OIW-6989 YJ $3 II0 + 0.00 1981 Pergamon Press trd

PLANE

JOHN G. DAUGMAN Harvard University, Division of Applied Sciences, Cambridge. ,MA 02 138, U.S.A (Received 1I MU_J 1983;

in reck-dform 29

Februur_v 1984)

Abstract-Properties of human spatial visual channels were studied in two-dimensional form by a signal detection masking paradigm. Tuning surfaces of contrast threshold elevation induced by a sinusoidal mask were generated for four Subjects, interpolated from an 1I x 1I Cartesian grid over the Fourier plane. and numerically Fourier transformed in two dimensions to infer putative filter profiles in the 2D space domain. Among the main findings in the 2D frequency domain were: (I) Threshold elevation surfaces are highly polar nonseparuble-they cannot be described as the product of a spatial frequency tuning curve times an o~entation tuning curve. (2) Iso-half-amplitude contours of the spectral tuning surfaces have a length/width elongation ratio of about 2: I. (3) Necessarily, resolution for spatial frequency and for orientation are in fundamental competition with 2D spatial resolution. By calculating the occupied area of the inferred filters both in the 2D space domain and in the 2D frequency domain, it was estimated that these mechanisms approach within a factor of 2.5 of the theoretical limit of joint resolution in the two 2D domains that can be derived by 2D generalization of Gabor’s famous Theory of Communication (1946). Other classes of 2D filters, such as an ideal 2D bandpass filter, have joint 2D entropies which are suboptimal by a factor of 13 or more. Subject to the inherent constraints on inference from these 2D masking experiments, the evidence suggests that 2D spatial frequency channels can be described as elongated 2D spatial wave-packets which crudely resemble optimal forms for joint infbrmation res&~ctiun in the 2D spatial and 2D frequency domains. Fourier plane Tuning surface 2D Gabor filters

Masking

2D spatial frequency channels

fNTRODUCTION

The thing that apparently constitutes the key to this mystery is an observation that is at once both trite and obvious, namely, that the retinal image is a surface. By the same token the visual manifoId4.e. the representation of the field of view within the visual cortex of the braincarries an intrinsic two-dimensional geometry. -W. C. Hoffman, The Lie Algebra of Visual Perception (1966). As Lunneburg (1947), Hoffman (1966), and Schwartz (1977) have pointed out in their analyses of the geometry of visual space, the underlying dimensionality of a sensory mode imposes far-reaching constraints on theories about the nature of the sensory representation. The two-dimensionality of the spatial variables in a retina1 image is a property that, aithough obvious, has tended to be rather inexplicit in the contemporary notion that the internal spatial visual representation may involve some processes resembling crude spatial frequency analysis. Indeed, the articulation and development of this theory ever since its inception 15 years ago has proceeded primarily through one-dimensional constructs, experiments, and models. The chief aim of the research reported here is to generalize to two-dimensional form, both in the space domain and in the spatial frequency domain, the empirical characterization of “spatial frequency channels” as inferred by the classical experimental paradigms. v R.%%A

Polar nonseparability

The theory of a substratum of spatial frequency channels underlying some aspects of visual perception has emerged from a broad variety of observations. The notion originated with the discovery that detection and discrimination thresholds for onedimensional periodic luminance patterns may be predicted from contrast thresholds of the individual Fourier components in their waveforms (Campbell and Robson, 1968). Additional early support for the channels concept arose from masking and adaptation experiments revealing the orientation specificity (Campbell and Kulikowski, 1966) and spatial frequency specificity (Blakemore and Campbell, 1969) of threshold elevation, followed by demonstrations of the statistical independence of detection thresholds for individual spatial frequency components summated together (Sachs et al., 197 1; Graham and Nachmias, 1971). ParaIIel to these mutually reinforcing psychophysical results, neurophysiolo~cal recordings from single cells in the visual cortex of cat and monkey showed bandpass spatial frequency tuning properties similar to those observed psychophysically (Campbell et al., 1969; Maffei and Fiorentini, 1973; Ikeda and Wright, 1975). Lurking always in the background, indeed perhaps as the principle inspiration, of these research paradigms was the seductive idea that a process resembling Fourier analysis might subserve our internal representation of the spatial visual world; the early explicit proponents of this view included Kabrisky, 1966; Blakemore and Campbell, 1969; Ginsburg, 1971; Polien et ai., 1971, 891

197-t: and Maffei and Fiorentini. 1971. 1973. Although the boldest version of the filtering theory {i.e. more-or-less strict Fourier analysis) is usually not taken seriously today, more modest versions advocating a coarse. local, frequency analysis embedded within the global space-domain mapping enjoy considerable current attention and favor (e.g. Graham et al., 1978; DeValois et ai., 1982: Pollen and Ronner. 1982; Watson, 1982; Sakitt and Barlow. 1982). To whatever extent such a representation might in fact be involved in vision, even if only in a local and coarse sense, the implied spatial frequency domain must be two-dimensional in order to be isomorphic to the spatial manifold of the retinal image. (The relationship between the 2D Fourier plane and the 2D space domain is described in Appendix A.) An instructive example of a pattern defined in these two domains is given in Figure 1, whose left-hand pane1 shows a circular distribution of punctate spectral loci in the (u,v) Fourier plane, corresponding to the space-domain image seen in the right-hand panel. In a one-dimensional (scalar) sense, this space-domain image contains only a single spatial frequency, namely a periodicity of 8 ddeg if the circular window is 2” in diameter. In this 2D image, however, the 8 c/deg component exists in each of six orientations. In order to synthesize any 2D image from Fourier components or to represent it thereby, an amplitude and phase must be specified for every point (u, L.) in the 2D Fourier plane. Every conjugate pair of points (+u, f t.) together represent a single sinusoidal wave-vector having spatial frequency ,/m (the distance from the origin of the Fourier plane) and orientation arctan(c/u) (the angular coordinate around the origin). In adherence with the requirement that a theory of image representation must preserve dimension, many important papers in the vision research literature have been explicitly 2D in the spatial frequency characterization at least of stimuli if not of visuai mechanisms (e.g. Kelly, 1950; Kelly et al., I975a, b, 1976; Mostafavi and Sakrison, 1976; Carlson et al., 1977; Weisstein er al., 1977; Ginsburg, 1978; Switkes et al., 1978; DeVaiois and DeValois, 1979, 1982; Wright, 1982; Caelli et a(., 1983). But the major psychophysica1 paradigms on which the evidence for spatial frequency channels is primarily based, such as adaptation, masking, and threshold summation paradigms, have never been generalized to yield a truly 2D characterization of the putative channels. Even the major review articles surveying and evaluating spatial frequency ideas (e.g. Sekuler, 1974) have neglected the second dimension. indeed, the great bulk of vision research in the spatial frequency genre has been explicitly one-dimensional, both in experimental protocol and in theoretical constructs. CRITIQUE OF THE ONE-DIMENSIONAL SPATkAL FREQUENCY PARADIGM

In part

this reduction

in dimension

serves an

analytic convznirnce and perildip> retlectj the greater familiarity of one-dimrnsronal Fourier theor.. or is a vestige of the analogous auditory theory which to some extent inspired the visual theory: and it is aiso due partly to the greater ease of generating onedimensional stimulus displays on CRTs. But the simplification is not without conceptual casualty. For example. ever since the initial formulation of the channels concept, the phraseology has often tended to impiy the existence of (one-dimensional) spatial frequency channels on the one hand and orientation channels on the other. Misleading inferences result from the failure to integrate fully these concepts. For example, in their seminal paper Biakemore and Campbell (1969) wrote that. in addition to the spatial frequency channels, “it may be significant that the visual system also transmits the input signal through a number of separate orientationally selective channels. .*’ (p, 258). Supporting this statement with a well-known illustration at the end of the paper, they demonstrate the difficulty of reading upside-down text and conclude that the spatial frequency analysis must be carried out (and learned) in specific orientations, failing to realize that inverting any image has no effect at all on the distribution of spatial frequencies present in different orientations. (It only shifts the phase of the sine components by n radians, and does not affect the cosine components). Even the space-domain empirical characterizations of detection mechanisms have generally neglected the second dimension. Thus, psychophysical measurements of the “neural line-spread function”, and models derived therefrom (e.g. Kulikowski and KingSmith, 1973; Legge, 1978: Wilson, 1979) have treated receptive fields as though they have meaningful structure in only one dimension and just integrate in an inconsequential fashion in the perpendicular direction. Like the ID spatial frequency characterizations, the use of line-spread (rather than point-spread) constructs treats the visual system as though it had cylindrical lenses and strip retinae. Similarly, the rapidly growing interest in applications of Gabor theory to spatial vision have stressed the necessary trade-off between a filter’s spatial frequency bandwidth and spatial resolution in one dimension, while generally ignoring the consequences of the equally fundamental trade-off between the filter’s spatial resolution in the perpendicular direction and its orientation bandwidth. A revealing example of how important it is for the spatial frequency paradigm in vision research to be two-dimensional is the case of the simple “elongation” model of orientation-selective receptive fields. Retinal (bipolar or ganglion ceil) receptive fields are usually described as center-surround mechanisms which are isotropic, having radial symmetry; one popular model, for example, is a difference-of-Gaussians function, rotated about its axis of symmetry. Orientation-selective cortical receptive fields, on the other hand, are sometimes

SCPd

. .

l

I

*

t

I

-.

l

l

. A



s CDd

-8

.

l

l

. l

-8

Fig. 1. Illusrration 01’ a X.3 sigdal defined is both 33 domains. In a 1D (sc&ul sense the right-hand pattern contains only a s~nglc spatial frequency, namely 8 cideg if’ the circular window is 2 deg in diameter. The 33 Fourier spectrum shows that this spatial rkqucncy is present in each of six aritntatians.

Spatial visual channels in the Fourier plane described as elongated (“stretched”) versions of such fields. thus having a strong orientation preference for a luminance bar oriented in the direction in which the receptive field is stretched. The elongation is presumed to impart the same orientation preference for sinewave gratings as it does for single bars. The fatal flaw (at least in the simplest version of this idea) is that despite its success with single bars, such a mechanism in fact cannot have any inrariubl,: preferred orientation for sinewave gratings; rather, there would be different preferred orientations for different spatial frequencies. (The converse is also true.) This property of the model follows from the 2D similarity theorem of frequency analysis. The original centersurround structure which lacks orientation tuning would have a 2D frequency spectrum whose maximum must be a continuous circle, centered on the origin of the Fourier plane. (The radius of the circle corresponds to the preferred spatial frequency, and this must be the same for all orientations because of the radial symmetry.) The 2D similarity theorem implies that stretching a function (the receptive field) in one direction by some factor simply compresses its 2D frequency spectrum in the same direction by the same factor. Thus, the simple elongation of the originally isotropic receptive field would just change its loci of maximum sensitivity in the Fourier plane from a circle to an ellipse, still centered on the origin. Therefore the model predicts no rtniquelJ preferred orientation for sine-wave gratings, in spite of ifs unambiguous elongation. For each spatial frequency there would be a different preferred orientation, and for different orientations there would be different preferred spatial frequencies. To give a concrete example: if the receptive field’s elongation ratio is 2: I in favor of the vertical orientation, and the best vertical grating has a spatial frequency of 4c/deg. then its response must be exactly the same (i.e. maximal) for a horizontal grating of 2 c/deg. This thought experiment, which demonstrates that orientation-selective cortical receptive fields (or visual filters) cannot simply be stretched versions of nonorientation-tuned ones, is somewhat surprising at first and it illustrates how important it is for 2D concepts of visual filters or receptive fields to be given an explicitly 2D frequency analysis. TWO-DIMENSIONAL SPECTRAL SEPARABlL1l-Y

35

addressed in either the neurophysiological or psychophysical literature, apart from a cursory study by Glezer et al. (1982); the issue was first raised in theoretical form in Daugman (1980). The previous filter example illustrates a severe kind of nonin which the preferred orientation separability, changes radically with spatial frequency and Lice cersu, even though the tuning for either variable may be very sharp once the other variable is specified. A less radical form of polar nonseparability is one in which only the filter’s spatial frequency banrirtidh depends on the orientation, and cite rersa. In general. the “bandwidth” of an anisotropic 2D filter will depend on the cross-section chosen, as illustrated in Fig. 2. The stipled area represents the area in the Fourier plane over which a putative channel or neuron might respond (above some criterion level), and the three intersecting lines indicate possible measures of the mechanism’s “bandwidth”, each giving a different result. Clearly. the same ambiguity would arise for cross-sections chosen along a polar grid. unless the particular filter happened to have a polar separable spectrum. (It is important to note that, whereas Cartesian separability in one domam implies the same in the other domain, no such mutual relationship exists for polar separability.) Given the arbitrariness of chasing any particular ID crosssection in Fig. 2, it is far more appropriate to describe visual filters in terms of their “occupied area” in the 2D Fourier plane, rather than in terms of a ID bandwidth. Chief among the empirical results to be reported here is the strong polar nonseparability of masking tuning surfaces obtained psychophysically. In a sense this is not surprising, because the vast majority of plausible 2D filters constructed from “intuitive”

The

Fourier

plane

t Y

c / deg

POLAR

A critical issue raised by the fore-going example is the question of polar separability of the frequency spectrum of a 2D filter: that is, whether or not the filter’s tuning surface over the Fourier plane can be described as the product of a tuning curve for spatial frequency times a tuning curve for orientation. If it can, then all radial cross-sections through the spectral surface are amplitude-scaled versions of each other, and so also are all concentric cylindrical crosssections. This is usually implicitly assumed to be the case, although the question has never been explicitly

I

0

c/deq

ll

Fig. 2. Illustration of the ambiguity of the “bandwidth” concept for characterizing 2D filter mechanisms. In general, the bandwidth of an anisotropic filter will depend on the spectral cross-section chosen; radial and angular crosssections would suffice only for polar separable 2D filters. A more appropriate descriptor would be “occupied bandarea” enclosed by a half-amplitude contour.

organizational principles in the space-domain yield, in fact, polar nonseparable spectra. [Polar nonseparability is demonstrated for several such spacedomain organizational principles, regardless of functional form, in Daugman (1983); the same paper also finds one 2D space-domain organizational principle. based on differential operators, that always yields fi!ters with functional forms having polar separable 2D spectra.] The most extensive empirical paper to date concerned with the 2D properties of spatial visual channels was that of Mostafavi and Sakrison (1976), who used signal detection methods with bandlimited noise stimuli to infer 2D channel bandwidths and summation exponents. Their measurements cannot directly address the issue of 2D polar separability, but in fitting functional forms they arbitrarily assumed spectral polar separability. [They assumed that the 2D channel spectral response profile could be described as a Butterworth frequency filter times a Gaussian function of angle; note their equations (13’), (12), and (14).] The present work suggests strongly that their assumption of spectral polar separability is wrong. EXPERIMENTAL

METHODS

All empirical results reported here arise from measurements of contrast threshold in two-alternativeforced-choice (ZAFC) signal detection tasks. After measuring absolute contrast sensitivity surfaces over the 2D Fourier plane, differential threshold elevation surfaces were obtained in a masking paradigm employing a “test” sinewave grating superimposed upon a high contrast “mask” sinewave grating of different orientation and spatial frequency. Five subjects including the author were used, all in their 20’s and all with normal vision corrected for myopia except for D.G., who was found in preliminary experiments to be an anastigmatic meridional amblyope and was not used in subsequent experiments, although his anomalous contrast sensitivity surface will be presented. A digital CRT image synthesizer was designed which permitted the display of combinations of oriented spatio-temporal sinewave luminance modulations, added or partitioned into separate image regions defined by an electronic image window with circular or rectilinear 2D apertures. The image synthesizer incorporated a raster scan of 60 kHz and a frame rate of 120 frames/set. Sinusoidal waveforms were digitally synthesized from trigonometric look-up tables stored in memory and addressed with a 2 MHz clock. The orientation of any given sinusoidal component was specified with IO-bit resolution, allowing wavevector modulation in 1024 discrete directions. In these experiments, at most two Fourier components with different orientations were displayed simultaneously, without temporal periodicity (drift or counterphase), and they were confined electronically to a circular image window subtending 2’ of visual angle and surrounded by a uniform field of matched mean

luminance. The display CRT was a Tektronix 608 with P3l phosphor and Z-axis amplifier y-corrected for negligible second harmonic distortion up to contrast levels of 50”“. The mean luminance was 22 cd m’ within the 2.5’ x 3.1’ phosphor area and also in a side-illuminated 10” x IO’ surround. Subjects viewed the display binocularly at a distance of 2.44 m. fixating vohtntarily on a small spot for confining the circular field to central vision. Contrast sensitivity measurements with and without the mask grating were made by standard signal detection methods, incorporating randomly interleaved 2AFC staircases. Each contrast staircase obeyed the 70.7% Wetherili staircase rule (Wetherill and Levitt, 1964), in which each incorrect response results in an increment in test contrast while two correct responses result in a decrement; such a rule converges to a point on the contrast psychometric function at which l/J? = 70.7% of the signals are correctly detected. Any given block of 100 trials tested only a single spatial frequency, which the subject could briefly inspect at moderate contrast before the start of each block of test trials. The starting contrast on any given staircase was taken from the current best estimate of threshold for that condition, and the contrast stepsize during each block began at 1 dB and shrank after initial staircase reversals to 0.3 dB. A computer controlled the digital image synthesizer; monitored the subject’s 2AFC responses via a two-button response box; provided two-tone feedback for correct and incorrect responses and produced a tone demarkating trial intervals; executed the 2AFC logic for each of the independent interleaved staircases; numerically computed 2D Fourier transforms; and plotted the data surfaces in 3D perspective with hidden-surface suppression. A run consisted of 100 2AFC trials on each of 20 different staircases and lasted about one hour. A given trial consisted of two 500 msec test intervals, separated by 200 msec, initiated by the subject’s previous response and announced by a tone during each test interval. During only one of each pair of intervals in a trial, selected by a random number generator, the test grating was presented with its contrast rising from zero according to an 80 msec Gaussian onset envelope up to a 300 msec plateau of contrast determined by the appropriate Wetherill staircase, followed by an 80msec Gaussian offset contrast envelope. Thus the bulk of energy in the temporal frequency spectrum of this trial presentation contrast waveform lay between 0 and 2 Hz. The masking grating was present continuously with a contrast of 32% when threshold elevation measurements were being made; the mask was not gated on and off because of the confounding forward and backward masking effects that result from such transients. Although some asymptotic adaptation level would be built up by the continuously present mask, this was not of concern because the adaptation phenomenon typically involves only one-tenth as

Spatial visual channels in the much threshold elevation as does masking, and in any case these two effects appear to have the same spatial frequency and orientation tuning properties. After the presentation of both test intervals the subject responded by pressing a button to indicate his guess about which interval contained the test grating, and the Wetherill decision rule then determined the contrast for the next trial on that staircase. After all independent staircases in the sample set over the Fourier plane were complete in a given run, the threshold for each sampled point was estimated by the geometric mean of all reversal points after the first IO trials on the associated contrast staircase. The standard error for this distribution of reversal points in a given run was typically 15% of the mean and was computed for each independent staircase. Several different sampling lattices were used for spanning the 2D Fourier plane with threshold measurements, including pure polar (ID) cross-sections and a full 2D Cartesian sampling lattice. Because the 2D Fourier transform kernel is Cartesian separable (while not polar separable), it was desirable to base the numerical transforms of the 2D data on a Cartesian sampling lattice. Table 1 shows the 1I x i I grid which defined the sampling lattice over the Fourier plane, with the locus of each cell expressed in the associated polar variables of spatial frequency and orientation. In later experiments, this measurement grid was extended by two additional rows at the top and at the bottom when samples at higher spatial frequencies were required to fit the skirts of the threshold elevation tuning surfaces. It should be noted that because of symmetry in the Fourier plane, it is only necessary to make measurements in any two adjacent quadrants; each point in the plane is associated with a redundant pair member reflected through the origin, jointly representing phase by their complex coefficients. In 2D experiments such as these, relative phase is irrelevant except in the special case in which both components have the same orientation and a harmonic frequency ratio. (When one Table I8.9cideg -45’ 17.1 c/deg -5l.33 15.6 c/deg -59’ 14.4 c/deg -68.2’ 13.6 c/deg - 78.7’ 13.3 c:deg -90 13.6 c:deg - 101.3’ 14.4 c/deg - I 11.8; 15.6 c/deg -121’ 17. I c/deg - 128.7’ 18.9 c,‘deg - 135’

17.1 cjdeg -38.715.1 c/deg -45’ 13.4 c/deg -53.1; I I.9 c/deg - 63.4’ I I c/deg -76’ 10.7 c/deg -90’ I I c/deg - 104’ I I .9 c/deg - 116.6’ 13.4 c/deg - 126.9’ 15. I c:deg -135’ 17. I c/deg - 141.3’

15.6cideg -3l13.4c/deg -36.9’ I I .3 c/deg -45> 9.6 c/deg -56.3’ X.4 c/deg -71.6’ 8 c!deg -90’ 8.4 c/deg - 108.4’ 9.6 cideg - 123.7’ 1I .3 c/deg - 135’ 13.4 c/deg - 143.1’ 15.6 cjdeg - 149”

I. Cartesian

14.4cideg -21.8’ I1.9c:deg - 26.6’ 9.6 c/deg -33.7’ 7.6 c/deg -45’ 6 cideg -63.4’ 5.3 cideg -90; 6 c/deg - 116.6’ 7.6 c/deg - 135’ 9.6 c/deg - 146.3’ I I .9 c/deg - 153.4’ 14.4 q’deg - 158.2‘

sampling

13.6c:deg -11.3” I I c/deg - 14: 8.4 c/deg -18.5. 6 cideg - 26.6: 3.8 c/deg -45’ 2.6 cideg -90’ 3.8 c/deg - 135’ 6 c/deg - 153.4’ 8.4 q’deg - 161.5. I I c/deg - 166’ 13.6 qdeg - 168.7’

Fourier

plane

397

changes the relative phase of any two wave-vectors combined in different orientations, the only effect is a rigid translation of the original 2D luminance waveform.) Therefore the irrelevance of phase permits the full 2D lattice in Table I to be filled up with only half as many measurements. In a full Cartesian grid sampling. four independent staircases of 100 trials each were devoted to each unique cell in the grid: the first two to estimate contrast threshold for the associated test wave-vector alone, and two more staircases to measure threshold elevation for this wave-vector when a vertical 8 c/deg mask grating of 329; contrast was simultaneously present. In summary, the total number of 2AFC trials used to obtain threshold elevation surfaces over the Fourier plane, for each of the four Subjects, was: (100 trials per staircase) x (2 independent staircases per condition) x (65 unique points in the 2D Fourier plane) x (2 full passes, with and without the mask) = 26,000 trials. Thus, for each subject, excluding training and practice sessions, some 15 hr of threshold measurements were required to generate the full 2D threshold elevation surfaces presented here. The full regimen was somewhat shortened for two subjects by eliminating the sidelobe regions in the Fourier plane (remote frequencies in the horizontal orientation) at which there was negligible masking effect. For any given subject, no more than 2 hr of data collection occurred on a single day, across sessions. RESULTS

(i) Normal and abnormal 20 absolute contrast sensitivity surfaces In order to obtain 2D masked threshold elevation surfaces over the Fourier plane, it is first necessary to measure the absolute contrast sensitivity surface. The upper panels of Fig. 3 show such data surfaces (reciprocal contrast threshold) for two subjects, one normal (R.F.) and the other a meridional amblyope lattice

13.3 cideg 0 10.7 cideg 0’ 8 c/deg O5.3 c/deg 0’ 2.6 c/deg 0: 2.6 c/deg 180” 5.3 c/deg 180” 8 c/deg 180’ IO.7 c:deg 180’ 13.3 c/deg 180’

over the Fourier 13.6cideg 11.3’ I I c,‘deg 14‘ 8.4 c:deg 18.5. 6 c/deg 26.6’ 3.8 c/deg 45’ 2.6 c/deg 90’ 3.8 c/deg 135’ 6 c,‘deg 153.4; 8.4 c/deg 161.5’ I I c/deg 166’ 13.6c/deg 168.7’

plane

14.4cideg 21.8I I .9 cldeg 26.6: 9.6 c/deg 33.7’ 7.6 cjdeg 45 6 cjdeg 63.4’ 5.3 cideg 90 6 c/deg 116.6’ 7.6 cjdeg 135’ 9.6 c/deg 146.3’ 11.9c/deg 153.4” 14.4c:deg 158.2’

15.6cideg 31’ 13.4 c/deg 36.9’ Il.3 c/deg 45’ 9.6 c/deg 56.3’ 8.4 c/deg 71.6’ 8 c/deg 907 8.4 c/deg 108.4’ 9.6 cldeg 123.7” I I.3 c/deg 135’ 13.4c/deg 143.1’ 15.6c/deg 149’

17.1 cideg 38.7’ 15.I c/deg 45; 13.4c:deg 53. I ’ I I .9 c.‘deg 63.4’ I I c,‘deg 76’ 10.7 cideg 9OJ I I c;deg 104’ I I .9 cjdeg 116.6’ 13.4 c/deg 126.9’ lS.lc,‘deg 135: 17.1 c;deg 141.3’

18.9c,deg 45’ 17. I c,deg 51.3’ 15.6cfdeg 59: 14.4 cideg 68.2’ 13.6 c.deg 78.7: 13.3 c,deg 90: 13.6 c/deg 101.3’ 14.4 c,‘deg 111.8~ 15.6 c,‘deg 121’ 17.lc:deg 128.7’ 18.9c,deg I35

(D.G.). In these plots the origin of coordinates is in the center, representing zero spatial frequency. and each point (r, 0) in polar coordinates corresponds to spatial frequency r and orientation 0. The axes are labelled by the associated Cartesian projection coordinates (u, c) where u = r cos(0) and L’= r sin(Q). Thus the highest vertical or horizontal spatial frequency tested was about 14 c!deg. whereas the spatial frequencies in the oblique meridia (where u = L’= 14) approach 20 c/deg. The measurement coordinates are linear in all three axes, maintaining a uniform sampling density for purposes of numerical 2D FOUrier transformation. The attenuation of contrast sensitivity at higher spatial frequencies appears more pronounced in these linear coordinates than when plotted in the log-log coordinates usually used for conventional ID contrast sensitivity functions. Such coordinates would be inappropriate for 2D contrast sensitivity measurements, primarily because no origin of coordinates exists in the log-polar Fourier plane; hence the shape of any data surface (e.g. its width/length ratio) would depend on an arbitrary decision about the spatial frequency at the center of the plot. In the upper panels of Fig. 3, any radial cross-section going through the center may be interpreted as a ID modulation transfer function (MTF), expressed in linear coordinates, showing contrast sensitivity as a function of spatial frequency, measured at the orientation represented by the chosen radian. The 2D MTF for subject R.F. in Fig. 3 shows a normal bandpass characteristic, with higher sensitivity for spatial frequencies in the 4-6c/deg range than for higher or lower frequencies. It also captures the normal “oblique effect” of higher sensitivity in vertical and horizontal orientations than oblique ones, for spatial frequencies higher than about 4 c/deg. (The vertical orientation corresponds to the coordinate axis which runs from upper left to lower right.) For comparison, the 2D MTF for subject D.G. shows a very abnormal characteristic, in that his sensitivity for vertical orientations is markedly lower than for horizontal orientations even though ophthalmic refractions revealed no optical astigmatism. Freeman and colleagues (e.g. Freeman and Thibos, 1975) coined the term “meridional amblyopia” to refer to such a contrast sensitivity differential between the vertical and horizontal orientations that cannot be attributed to any on-going optical astimatism. The case of D.G. is rather unique because heretofore meridional amblyopia has always been associated with continuing (albeit optically corrected) astigmats. However, the high incidence of transient infant astigmatism in the first two years of life (Atkinson et al., 1980) and the well-documented maturational plasticity which allows orientationbiased retinal images during development to result in permanent neural anisotropy (e.g. Blakemore and Cooper, 1970), would suggest that transient infant astigmatism may explain D.G.‘s meridional am-

blyopia despite his current optical normalit?. D.G. did not serve as a subject in an) further experrments once the data in Fig. 3 were collected. but his absoluts sensitivity surface over the Fourier plane IS illuminating and helps fo illustrate normative properties of the 2D MTF. The peak sensitivity of subject R.F. as shown in the upper-left panel of Fig. 3 corresponds to 0.3”; contrast, while that for D.G. corresponds to 0.5Y,. It is interesting to note that while D.G. is less sensitive in the vertical orientations, he is unusually sensitive (at higher frequencies) in the horizontal orientations. Although tempered by inter-subject variability, this observation is clearly more consistent with a “recruitment” theory than an “atrophy” theory of meridional amblyopia. Indeed, the “total sensitivity” as defined by the colctme under the contrast sensitivity surfaces of Fig. 3 is roughly equal for both subjects, despite their major structural differences, as would be predicted by a theory of recruitment (redistribution) of neural resources. The lower two panels of Fig. 3 show the 2D numerical Fourier transforms of the corresponding upper panels, and thus they might be interpreted as neural point-spread functions of the fovea1 visual systems of these two subjects, under the assumptions of approximate homogeneity in the 2deg fovea1 region tested and response linearity near threshold. It has been shown that the transfer function of the eye’s optics normally has little effect on the kind of measurements on which Fig. 3 is based, as discussed in the ID case by Freeman and Thibos (1975, p. 706), and so such a 2D space-domain kernel primarily characterizes the composite neural manifold. For normal subject R.F., the neural “oblique effect” is evident also in her space-domain point-spread function in Fig. 3 in the form of anisotropic secondary extrema in the principal meridia. The halfwidth of her point-spread function in the vertical or horizontal cross-sections is 0.055 deg, or 3.3 min of arc. Similarly, subject D.G.‘s meridional amblyopia is evident in the asymmetry of his point-spread function. whose halfwidth in the horizontal direction is 2.2 min of arc but 7 min of arc in the vertical direction. This 3.2 aspect ratio (width/length) of D.G.‘s point-spread function is roughly the reciprocal of the aspect ratio of his 2D MTF, as expected. The fact that the peak amplitudes of the two subjects’ point-spread functions are roughly equal reflects the fact that the “total sensitivity” volumes under their ZD MTFs are roughly equal, as noted previously. (ii) Two-dimensional channel [uning surfaces in the Fourier plane

The use of the term “channel” in describing masking tuning curves dates to a statement by Campbell and Kulikowski (1966) referring to, “The narrow orientationally tuned channels found psychophysically by this masking technique.. .” (p. 437). But both of the terms “masking” and “channel”

Spattal

visual channels

Two-dlmenslonol

Subject

modulaflon

Transfer

plane

899

functtons

R F

Two-d~menslonal

Subject

in the Founer

Stibject

polnf

RF

spread

DG

functions

Subject

DG

Fig. 3. Two-dimensional modulation transfer functions (ZD-MTF) and associated point-spread functions (PSF) for a normal subject (R.F.) and a meridional amblyope (D.G.). Viewed in polar coordinates with the origin at the center, the height of each ZD-MTF surface at any point (r,O) corresponds to visual sensitivity (reciprocal contrast threshold) for a sine-wave grating of spatial frequency r and orientation 0. Radial and angular cross-sections through these surfaces capture the “bandpass” characteristic of overall spatial frequency sensitivity and the anisotropy in orientation (the “oblique effect”). The meridional amblyopia of D.G. produces a ZD-MTF markedly compressed in one direction. These biases. both normal and abnormal, appear also in the PSF’s inferred in the lower two panels by numerical 2D Fourier transformation of the upper two panels.

today remain cloaked in ambiguity. Legge and Foley (1980) observed that as of 1980 there existed no model of contrast masking. The lack of consensus about what a channel is was emphasized in a recent review by Regan (1982), surveying several different meanings ranging from simple “selective sensitivity”

to the concept of orthogonal filters with or without threshold detection stages. to populations of neural mechanisms or even single neurons, and so on. My operational use of the term here, in referring to the spread of contrast threshold elevation in the Fourier plane due to masking, is more or less consistent with

30!)

JOHN C.

most of these interpretations, if suitably- rnnched. The basic measurement in masking or adaptation experiments is the amount by which a high contrast interfering pattern elevates the detection threshold for a low contrast test pattern, coincident or contiguous in space and time. as a function of some parameter in which the two patterns differ (such as orientation or spatial frequency). Two different methods have generally been adopted. In one (e.g. Blakemore and Campbell, 1969), the masking or adapting pattern always takes on the same value of the parameter of interest while the test pattern varies in this parameter as threshold elevation factors are measured relative to the unmasked threshold for each case. In the other method (e.g. Legge and Foley, 1980). the test pattern is always the same while the mask varies along the continuum of interest. In skeletal form, both procedures assume that the resulting “notch” in the contrast sensitivity function reveals something of the selectivity of the presumed underlying filters. The first of these methods (constant mask) was employed here. If one assumes provisionally that the different presumed mechanisms which detect the different test patterns have their thresholds elevated in proportion to the extent to which they respond to the mask pattern, then the observed notch can be interpreted loosely as a characteristic mechanism’s tuning curve. Significant secondary factors such as probability summation among mechanisms and the dependence of masking tuning parameters on mask contrast complicate this interpretation. Avoiding these secondary considerations, a first-order theory might suppose that the underlying 2D filter tuning surfaces are locally homogeneou~that is, neighboring filters in the Fourier plane have roughiy the same 2D spectral shape but different 2D center frequencies. The mask pattern is a 2D delta function in this domain, located at a particular spatial frequency and orientation (namely 8c/deg, vertical, in these experiments). Many different mechanisms respond to the mask, in proportion to the amplitude of their spectral tuning surface at this particular location in the Fourier plane. Assuming proportionate threshold elevation, the resulting bump in the 2D threshold surface spanned by the tests is equivalent to the 2D convolution of the mask spectrum with the characteristic local mechanism’s tuning spectrum. Since the mask spectrum is a 2D delta function, this 2D convolution across mechanisms would recover simply the characteristic local mechanism’s 2D tuning surface centered on the 2D frequency of the mask. Although this first-order theory is surely inadequate in many respects, at least it provides a general conceptual framework for extracting basic channel tuning characteristics from 2D masking data. Figure 4 presents 2D threshold elevation surfaces for each of the four subjects, sampled along the Cartesian grid spanning the Fourier plane (Table 1) and interpolated by a 2D second-order Taylor series

DAUGM.I~

expansion between the points to form a continuous surface. plotted in 3D perspective. The mask bvas always a vertical 8c;dee_ 3orating with 31”;, contrast while the test pattern ranged over the 2D grid in a randomized sequence, ultimately allocating 200 staircase trials to each point in the grid both with and without the mask present. Each of the data surfaces in Fig. 4 might best be interpreted in polar coordinates, with the origin (zero spatial frequency) at the center of each plot. Thus the two peaks in each data surface occur at the conjugate pair of points in the Fourier plane which represent a vertical 8 c/deg grating; if this were a phase sensitive experiment. then two surfaces (real and imaginary parts) would be required for each subject, and the two peaks in each case might have different amplitudes and polarities. Although there was noticeable variability between subjects in their peak threshold elevations, ail were in the range of 170~2400~~ at the peak and the plotted surfaces have all been scaled to have the same peak amplitude in order to facilitate study of their shapes. The remote sideband regions of the frequency plane for subjects W.C. and J.D. were not studied, because the threshold elevation there was virtualIy nil, and thus these regions are arbitrarily smooth in comparison with the residual variability in those regions for subjects H.W. and R.F. who completed the full 2D regimen. Some of the important features of the data surfaces of Fig. 4 are better understood by plotting their half-amplitude cross-sections over the Fourier plane; these are shown in Fig. 5 for each of the four subjects. The property of masking “localization” in both spatial frequency and orientation is quite clear from Figs 4 and 5. As noted in the Introduction, this property is not possessed by the 2D frequency spectra of all “physiologically plausible” models of receptive field profiles tuned for spatial frequency and orientation. For exampIe, Fig. 5(b) of Daugman (1980) shows a simple celi receptive field organization based on an elongated excitatory strip surrounded by inhibitory flanks, whose 2D frequency spectrum wraps all around the origin of the Fourier plane; if the half-amplitude spectral contours of such a filter were plotted as in Fig. 5 here, they would be ellipses centered on the origin. Thus the contours found in Fig. 5 permit the rejection of certain classes of elongated space-domain visual filter models. Recently, actual simple cell characterizations in the 2D Fourier plane have been plotted by DeVaIois er al. (1982), and they show the same general shapes as the localized ellipses found here by psychophysical masking. Of the four classes of 2D filter models explored by Daugman (1980), the one whose 2D frequency spectrum most closely resembles in functional form this psychophysical masking data is the 2D “Gabor” filter shown in Fig. 6 of that work. This issue of 2D functional form, in the 2D frequency and 2D space domains, will te explored further in the concluding section of this paper.

Spatial visual channels in the Fourier plane Masrtng

funlng

surface over the Fourier plane

mask-Bc/deg

SuDlecr

901

+iw

verttcal

Sub]&

Subject

J

0

R.F

Fig. 4. Threshold elevation surfaces produced by an 8 c/deg vertical mask of 32% contrast, interpolated over a Cartesian grid of 121 sample points in the 2D Fourier plane for each of four subjects. The locations of the peaks correspond to (8 c/deg, vertical) and have an amplitude of about 2OOOo/Q. The four surfaces have been scaled so that their peaks precisely match in height in order to facilitate comparison of their structure.

The fact that signal detection is a statistical process involving intrinsic stochastic variables which influence threshold estimates has been recognized for some time, and great effort has been invested in studying the effect of “probability summation” among mechanisms. Legge and Foley (1980) point out that although probability summation models have been well-developed for simple detection tasks, they are not presently applicable to contrast masking tasks. Their experiments in fact show that the effects of pooling are largely absent when mask contrasts exceed about loA, and that forced choice detection

decisions in such cases are based on the output of a single detector (see their Fig. 6 and p. 1467), as assumed in the general conceptual framework adopted here. A more dramatic factor which affects the relation between masking tuning surfaces and the inferred underlying filter response characteristic, is any nonlinear growth of threshold elevation with increasing mask contrast. Although in these experiments the mask contrast was always 32x, the reduced efficacy of off-frequency masking is sometimes interpreted in terms of lower equivalent mask contrast (the “equiv-

ioc.

of in Mask

Subject

hclf-maxtmoi Fourier - 6

c/di3g

H w

mask,ng

plane vB:fICCI

Subject

J.D

c /deg

Subject

W.C.

‘6

Subject

c /deg

c/deg

RF

c /deg

Fig. 5. Iso-amplitude contours of the tuning surfaces shown in Fig. 4, sliced at the half-amplitude level. If orientation-selective visual filters in the 2D space domain were simply elongated (“stretched”) versions of isotropic filters, then these iso-amplitude contour lobes would be connected and circumvolving rather than distinct.

alent contrast transformation”). The extensive study of 1D masking tuning functions by Legge and Foley (1980) at different mask contrasts leads to the conclu-

sion that a spatial frequency tuning curve obtained by masking is related to the frequency selectivity of the underlying linear filter by a power function: the linear filter’s spatial frequency sekctivity function times the

mask contrast, all raised to an exponent close to 0.62, predicts the obtained threshold elevation function. [See their Fig. 8(a) and equation (1) and their Table I for estimates of the exponent.] Inverting their equation (I) implies that for experiments using a constant mask contrast, the obtained frequency selectivity masking data should be raised to the 1.6

Spatial visual channels in the Fourier plane

903

operation may be visualized by imagining several concentric, vertical cylinders slicing through the surfaces of Fig. 4, all centered on the origin; the curve of intersection with the data surface is marked on each cylinder. When a given cylinder is unwrapped, its curve of intersection corresponds to the orientation tuning curve of the 2D filter for a parriczrlar spatial frequency, which corresponds to the radius of the cylinder. If the tuning surface of the 2D filter were the product of a spatial frequency characteristic times an orientation characteristic (i.e. polar separable), then all such curves would be simple multiples of each other, having a constant orientation bandwidth. It is clear from Fig. 6 that this is not the case. When sliced at 8 c/deg, the angular half-bandwidth of the threshold elevation surface is 13.6”, but when sliced at 4c/deg it expands to 26.8”, and when sliced at 12c/deg it shrinks to 9.8’. The orientation bandwidth of masking becomes broader when the test frequency is lower than the Fig. 6. Angular cross-sections sliced through the threshold mask frequency, and it becomes narrower when the elevation surface of subject H.W. in Fig. 4. corresponding test frequency is higher than the mask frequency. This to the intersection of that surface with a nested set of concentric cylinders whose radii represent the probe spatial rule applies whether tuning surfaces are mapped out frequency. The orientation tuning curves of that inferred 2D by using one test grating and many different mask filter have different bandwidths for different probe spatial gratings, or by using one mask grating and many frequencies. different test gratings. This polar nonseparability has been systematically mapped out in Fig. 7 for the data power in order to reveal the frequency selectivity of surfaces of all four subjects of Fig. 4. Each subject’s data surface was “intersected” computationally by the underlying linear filter. According to this interseven concentric cylinders whose radii were logapretation, then, the 2D data surfaces of Fig. 4 should rithmically disposed, yielding 28 orientation tuning all be raised to the 1.6 power; this would sharpen each tuning surface and so the “almonds” in Fig. 5 curves such as those of Fig. 6 for H.W. The orienrepresenting half-maximum filter contours would be- tation half-bandwidths of these 28 tuning curves are the points plotted in Fig. 7. If the threshold elevation come smaller. It should be noted that this operation of raising the 2D data surfaces to some power would have no effect on the issue of polar separability of the 2D channel spectra into the product of a spatial frequency characteristic times an orientation characteristic. (iii) Polar no~~e~a~abi~i~~of the 2D tunitzg surfme The data surfaces of Figs 4 and 5 based on a vertical 8 c/deg mask have the property that, at any given test spatial frequency, the peak threshold elevation occurs in the vertical orientation; and at any given test orientation, the peak threshold elevation tends to occur at a spatial frequency near 8 c/deg. As noted earlier, this property of polar invariance of conditional extrema is not possessed by many classes of plausible space-domain filter models, when analyzed in the 2D Fourier domain. But the stronger condition of full polar separabihty of the 2D tuning surface into the product of one function of spatial frequency times one function of orientation, which is possessed by almost no standard 2D space-domain visual filter organizational principle (Daugman, 1983), is neither possessed by the data surfaces of Fig. 4. Figure 6 demonstrates this point. Angular cross-sections have been cut through the threshold elevation surface of subject H.W. from Fig. 4. This

;

a

511 060

I

I

t

,

1

077

loo

< 29

167

215

Frequency

I 278

ratio ( mask/rest1

Fig. 7. Polar nonseparability of the masking tuning surfaces of all four subjects, inferred by the intersections of concentric cylinders with each threshold elevation surface. Otientation half-bandwidths are plotted as a function of the ratio of the nominat channel “center frequency” to the probe frequency. If a 21) channel (as inferred by masking) could be described as the product of a spatial frequency tuning curve times an orientation tuning curve, then all of the data points here would fall on a horizontal !ine.

JOH> G. DALGYA~

Space aomain ! even

Subjec:

ir

kernel

sjmmerry

) Subject

W

J 3

Subject H.W.

Sublect

WC

c\

Subject

RF

Fig. 8. Two-dimensional space domain filter profile for each subject, as inferred by numerical 2D Fourier transformation of the corresponding tuning surfaces in Fig. 4 and assuming even symmetry. Inset shows

a ID central cross-section for H.W. at twice the scale.

surfaces over the Fourier plane were polar separable, then all the points in Fig. 7 would fall on a horizontal line. Instead, the monotonic increase in orientation bandwidth with increase in the ratio of mask frequency to test frequency is quite steep: the orientation half-bandwidth rises steadily by a factor of three, from about 10” when the mask is an octave below the test to more than 30’ when the mask is 1.5 octaves above the test. (iv) Two -dimensional space domain filter profiles

Under the assumption that the threshold elevation data surfaces of Fig. 4 in some sense represent 2D spectral profiles of spatial frequency channels, it is meaningful to compute the associated space-domain 2D profiles of such filters. Such an analysis was

carried out computationally by numerical 2D Fourier analysis, and the resulting space-domain 2D filter profiles are shown in Figs 8 and 9 for each of the four subjects. Because the psychophysical experiments themselves were phase insensitive (since the phase relation between mask and test gratings with different orientations has no effect on the luminace waveform except for translation), the raw data of Fig. 4 could arise from filters whose space-domain weighting functions are any linear combination of the even and odd-symmetric 2D profiles shown for each subject in Figs 8 and 9. One may interpret these 2D spacedomain profiles as constituting (in any linear combination) a filter which, in 2D convolution with a sinewave grating of specified orientation and spatial frequency, gives an output sinewave whose peak-to-

Spatial

visual

channels

Space

in the

domain

Fourier

kernel

(odd symmetry

Subject

tiw

1

Subjecr

Subject

Subject

plane

WC

JO

HW

Sublect

RF

Fig. 9. Two-dimensional space domain filter profile for each subject, as inferred by numerical 2D Fourier transformation of the corresponding tuning surfaces in Fig. 4 and assuming odd symmetry. Inset shows a ID central cross-section for H.W. at twice the scale.

peak amplitude is equal to the value of the associated surface in Fig. 4 at the associated location in the Fourier plane. In both Figs 8 and 9, the ID central cross-section in the vertical orientation is shown for Subject H.W., over twice the spatial extent as the 2D profiles (i.e. over 0.7 deg of visual angle rather than 0.35 deg). This lD cross-section corresponds to the space-domain characteristic sometimes inferred from 1D experiments (e.g. Kulikowski and King-Smith, 1973; Legge, 1978). The extended 1D cross-sections in Figs 8 and 9 are wave-packets which appear to be quiescent after four or five extrema. Overall, the transformed fitter profiles in Figs 8 and 9 closely resemble the classical concept of the 2D

receptive field profile of linear simple ceils in the visual cortex, dating back to the early characterizations of Hubel and Wiesel (1962). The evensymmetric computed profiles in Fig. 8 are reminiscent of tri-partite receptive fields, having an elongated excitatory center flanked on either side by shallower inhibitory sidelobes; the odd-symmetric profiles of Fig. 9 resemble bipartite simple cell receptive fields. Moreover, the 1D spatial frequency bandwidth and orientation bandwidth of these psychophysically inferred filters (which can be obtained from the simple ID radial and angular cross-sections of the spectral iso-amplitude contours of Fig. 5), are in the range of band~dths encountered in simple cells. Although a

ps~chophysicai~~ inferred rophys~olo~icaily inferred very different entities. their spatial and 2D spectral perhaps noteworthy.

hirer profile and a neureceptive iield profile are similarities in both the 2D domains arc nonetheless

(c) Two-dimensionalfilters and the theoretical limit vj‘ joint resolution It is of some interest to characterize the functional form of the psychophysically inferred 2D filters, both in the space domain and in the spatial frequency domain. It is well-known that threshold elevation due to masking is “localized” both in space and spatial frequency, in the sense that significant threshold elevation generally requires that the mask and test targets overlap in each of these variables. Because of the logic by which channel profiles are inferred from such threshold elevation measurements, these profiles will also be localized both in 2D space and 2D spatial frequency. Similarly, joint localization in space and spatial frequency is a common neurophysiological observation about the response properties of single visual neurons. There has recently been some interest in describing linear simple cells as “Gabor” filters (Marcelja, 1980; Mackay, 1981; Kuliko~ski and Bishop, 1981; Daugman, 1983, because their receptive field profiles are well-tit by the functional form of Gaussian attenuated sinusoids. Gabor (1946) proved that (temporal) signals having this wave-packet functional form possess the maximum possible degree of joint localization in both the time domain and the temporal frequency domain, and in this sense they offer an optima1 way of encoding and transmitting information under the constraint of minimizing the product of signa bandwidth and duration. Gabor’s analysis is clearly generaiitable from temporal duration and frequency to spatial extent and frequency, and to an analysis of filters as well as of signals. Daugman (1982) generalized the previously 1D application of Gabor’s analysis to a Full 2D derivation and polar moment analysis of “occupied area” jointly in 2D space and 2D spatial frequency; the theoretical problem of acausality implicit in Gabor’s original formulation is of course eliminated in this generalization. Earlier work (Daugman, 1980, Fig. 6) presented the optimal 2D filters as a 2D Fourier transform pair. Thus the question naturally arises: how efficient is the encoding of spatial visual information by the psychophysically inferred 2D filters, in this sense of joint resolution in both 2D domains? Evaluating this efficiency requires comparison both with the theoretical limit and with certain interesting classes of 2D filter models. The existence of a fundamental limit of joint resolution that can be achieved by a filter in the two 2D domains may be derived by analyzing the ‘“polar moment of inertia” of a generalized 2D filter, as a measure of the area it occupies in each 2D domain. IfJ(x,y) is a generalized 2D filter or signal centered on the origin, and f*(.r,,r) is its complex

ConJUgate. then its polar mtimenr defined as

vi inertia

ma! be

which in normalized form is a measure of the ZD area occupied by the function. in the same way that the 1D second moment (variance) is a measure of the width of a ID distribution. Iff(s,r) has 2D Fourier transform F(u,c), the analogous polar moment of inertia integral over the (u,c) Fourier plane measures the amount of area in which the filter or signal is significantly energetic in the Fourier plane. A filter with sharp resolution in both the 2D space domain and the 2D spatial frequency domain would occupy small areas in both domains. As shown in Daugman (1982) the product of these two occupied areas must always be greater than or equal to a particular constant, k: I

-I (x2 t _r’).f* d.u dj

s -7 J --I 13 l;C ~~~~~~.~~~,~~~~~~~~~~~

2 k.

(For simplicity. the 2D filter functions are assumed here to be origin-centered.) This quantity, k, represents the lower bound of all possible joint area products, regardless of the 2D filter’s functional form or parameters, and thus it may be considered the theoretical limit of joint resolution. Since the “OCcupied area” of a filter or signal can be identified with its 2D entropy, this theoretical lower bound can also be interpreted as a minimum possible joint entropy. The particular family of 2D functions which actually achieve this optimal degree of joint resolution (minimum joint entropy) are the following, which were named “2D Gabor functions” in Daugman (1982): f‘(x,y) = exp{ -n[(.x

-xo)‘a2

+ (y -y$b’]l

xexp{-2r~i[~~(x--x~)+~‘~(~-,v~)]} F(rc,c) = exp( -n[(~

- uO)‘/a2+ (r - co)‘.‘h’]}

x exp{ -2ni[.u,(a

- uO)+ yO(c - c,)Jl.

It should be noted that 2D Gabor functions are not polar separable in either domain, and that their functional form is identical in both domains. The 2D filter family represented by the above equations embraces a broad variety of filters, each specialized for the extraction of different kinds of information from the spatial image. Parameters a and b determine the spatial dimensions of the filter; x0 and y0 its center location in 2D visual space; and u0 and u, the location in the 2D Fourier plane of greatest sensitivity. Thus the filter’s preferred spatial frequency is JGj, and its preferred orientation is arctan (co/~& its bandwidths for spatial frequency

Spatial

risual

channels in the Fourier

and orientation are determined jointly by the four parameters a.b,u,,c,,. These parameters capture a fundamental four-dimensional uncertainty principle for resolution in the two domains, since any sharpening of resolution in one domain is necessarily accompanied by a corresponding loss of resolution in the other domain. The condition of optimal joint resolution (minimum joint entropy) is achieved by all 2D filters embraced by the above equations, regardless of the values of any of the parameters. It is clear by inspection of this functional form (see Fig. 6 of Daugman, 1980) that it could give a good representation of the reconstructed empirical 2D filter surfaces shown in the present work in Fig. S, if the parameters in the earlier example were set to yield a slightly broader bandwidth and a nonunity aspect ratio (a > b) in negotiating the trade-off between orientation resolution and spatial resolution in the direction of elongation. The intrinsic power of 2D Gabor analysis of a representation is that it simultaneously captures both the 2D frequency domain properties of a filter and its 2D space domain features, as well as making explicit the inescapable trade-offs in resolution for orientation, spatial frequency, and the two coordinates of image space. As an approximate evaluation of the polar moment of inertia integrals, we may estimate the joint occupied areas of several interesting 2D filter families and of the empirical filter shapes simply by calculating the areas contained within some well-defined criterion such as l/e amplitude of the 2D filter envelope or (when appropriate) first minimum. The joint entropies of three classes of 2D filter models are evaluated in Appendix II, for comparison with the empirical 2D tuning surfaces inferred from the 2D masking experiments. The three theoretical classes of filters are: (i) the 2D Gabor filters; (ii) the Gaussiansmoothed Laplacian filter (V*G) advocated by Kelly (1975b) and Marr (1982); and (iii) ideal 2D band-pass filters or pixel filters. The results of the calculations

907

plane

of joint entropy in Appendix II are summarized in Tabfe 3. For comparison. we may calculate the joint area products for each of the four subjects in the full 2D masking experiments, in which the percent threshold elevation was obtained for 121 points in the Fourier piane. Table 2 presents the results of numerical integrations of the areas occupied in the Fourier plane within both lobes of the l/e iso-amplitude filter contours (as in Fig. 5) and also of the area occupied in the 2D space domain within the minima of the inferred even filter kernel (as in Fig. 8). for each of the four subjects. Table 2 also gives the aspect ratio (length/width) for each of the empirical 2D filters, as measured from Fig. 5, which is informative because it reflects the relative allocation of filter area for orientation resolution and for spatial frequency resolution. For filters occupying some fixed amount of area in the Fourier plane, these two kinds of resolution are in fundamental competition with each other: any gain in orientation resolution would inevitably be accompanied by a loss in spatial frequency resolution, and vice versa. We are now in a position to collect together these analytic and empirical results on joint resolution in the two 2D domains, in order to answer the question raised at the beginning of this section about the joint efficiency of the psychophysically inferred filters, Table 3 presents in summary form the joint occupied area products for the different theoretical filter classes examined and for the masking tuning surfaces obtained with each subject. The average joint occupied area product for the four subjects is 5.17, or about 2.5 times larger than the theoretical minimum of 2.0 achieved by 2D Gabor filters and considerably smaller than the numbers for ideal lifters. An important difference between the filters inferred by masking and the 2D Gabor filters is that the latter have symmetry in linear coordinates, whereas the former tend more to be symmetrical in logarithmic

Table 2.

Subject W.C. J.D. R.F. H.W.

2D frequency domain --_--.._ ~__ Occupied area (c/de@’ Aspect ratio 14.5.I 114.8 98.1 131.9

2.10 1.91 2.13 2.33

Filter family 2D Gabor filtersvc filters Ideal filters -Circular bandpass disk -Square pixel Psychophysical masking tuning surfaces -Subject W.C. -Subject J.D. -Subject R.F. -Subject H.W.

?D space domain Occupied area (de@’ 0.0359 0.0468 0.0525 0.0375

Joint occupied area product 2.0 6.0 13.0 16.0 5.21 5.37 5.15 4.94

Thus. in Fig. 5, the inferred filters tuned to 8 c;deg tend to have high- and low-frequency cut-offs at 16 and 3c deg, respectively. This logarithmic symmetry may have topographic advantages in “paving” the 2D Fourier plane in log-polar coordinates, with resolution in this 2D domain determined by a constant ratio and constant over-lap principle, although the lack of symmetry in linear frequency coordinates causes a loss in joint resolution. (Two-dimensional I-Termite functions, of which Gabor functions are members with order 0, would capture the desired asymmetry with minimal loss of joint resolution.) In any case, it is perhaps noteworthy that the inferred 2D filters come as comparatively close as they do to the theoretical limit of joint resolution. This observation lends indirect support to the view that the visual system is concerned with extracting information jointly in the 2D space domain and in the 2D frequency domain, and because of the incompatibility of these two demands, has evolved towards the optimal solution via 2D channels that roughly approximate 2D Gabor filters. Functionally analogous to the speech spectrogram, which decomposes the 1D speech waveform into spectral formants within windows of time, such a visual 2D Gabor matrix would perform coarse 2D spectral analysis within 2D windows of visual space. An assumption in this interpretation is that there is advantage in economy and efficiency of representation. This interpretation of 2D spatial visual encoding might be summarized by paraphrasing as follows the famous Second Dogma of Barlow (1972): coordinates.

The visual system is organized to achieve as complete a representation of the visual stimulus as possible, in both 2D spatial and 2D spectral terms, with the minimum number of 2D filters. Acknowledgemen&--Supported by AFOSR Contract No.

Campbell F. W.. Cooper G. F and Enroth-Cug$l C. ( 19691 The spatial selectivity of the visual crils ot the cat J Phvsiol. 203 9 2’3-235. -

Campbell F. W. and Kulikou-ski J. .I. (1966) Orientatton selectivity of the human visual system. J. PhJsioi. 187, 437-l45.

Campbell F. W. and Robson J. G. (1968) Applicatton of Fourier analysis to the visibility of gratings. J. Physiol. 197, 551-566.

Carlson C. R., Cohen R. W. and Gorog I. (1977) Visual processing of simple two-dimensional sinewave luminance gratings. Vision Res. 17, 351-358. Daugman J. G. (1980) Two-Dimensional spectral analysis of cortical receptive field profiles. Vision Res. 20,847-856. Daugman J. G. (1982) Uncertainty relation for resolution in space, spatial frequency. and orientation optimized by two-dimensional visual cortical filters. J. opt. Sot. Am. To be published. Daugman J. G. (1983) Six formal properties of twodimensional anisotropic visual filters: Structural principles and frequency/orientation selecttvity. I.E.E.E. Trans. Systems, Man, and Cybernetics 12, Sept. (1983). DeValois R. L.. Albrecht G. and Thorell L. G. (1982) Spatial frequency selectivity of ceils in macaque visual cortex. Vision Res. 22, 545-559. DeValois K. K., DeValois R. L. and Yund E. W. (1979) Responses of striate cortex cells to grating and checkerboard patterns. J. Physiol. 291, 483-505. Freeman RI and Thibos L: N. (1975) Contrast sensitivity in humans with abnormal visual experience. J. Physiol. 247. 687-710.

Gabor D. (1946) Theory of communication. J. I.E.E., Lond. 93, 429-457.

Ginsburg A. P. (1971) Psychological correlates of a model of the human system. Proc. 1971 Nat. Aerospace Electronics Con/. (NAECON), I.E.E.E. Trans. on Aerospace and Electronic Systems, pp. 283-290.

Ginsburg A. P. (1978) Visual information processing based on spatial filters constrained by biological data. Ph.D. dissertation, Cambridge Univ. Glezer V. D., Tsherbach T. A., Gauselman V. E. and Bondarko V. M. (1982) Spatio-temporal organization of receptive fields of the cat striate cortex. Biol. Cybernet. 43, 35-49.

Graham N. and Nachmias J. (1971) Detection of grating patterns containing two spatial frequencies: A comparison of single-channel and multiple-channel models. Vision Res. 11, 251-259.

F49620-81-K-0016. Parts of this paper were presented at the Fourth European Conference on Visual Perception. Paris, September 1981; and at the Annual Meeting of the Optical Sociefy of America, Tucson, October 1982.

Graham N., Robson J. G. and Nachmias J. (1978) Grating summation in fovea and periphery. Vision Res. 18, 81 j-825. Hoffman W. C. (1966) The Lie algebra of visual perception.

REFERENCES

Hubel D. and Wiesei T. N. (1962) Retentive fields, binocular interaction and functional’ architecture in the cat’s visual cortex. J. Physiol. 160, 106-l 54. Ikeda H. and Wright M. J. (1975) Spatial and temporal properties of ‘sustained’ and ‘transient’ cortical neurones in Area I7 of the cat. E.rpi Brain Res. 22, 385-398. Kabrisky M. (1966) A Proposed Model for Visual Information Processing in the Human Brain. Univ. of Illinois Press, Urbana. Kelly D. H. (1960) J, stimulus patterns for vision research. J. opt. Sot. Am. 50, I 115-l 116. Kelly D. H. (1975a) Pattern detection and the twodimensional Fourier transform: circular targets. Vision Res. 15, 91 l-915. Kelly D. H. (1975b) Spatial frequency selectivity in the retina. Vision Res. 15, 665-672. Kelly D. H. (1976) Pattern detection and the twodimensional Fourier transform Flickering checkerboards and chromatic mechanisms. Vision Res. 16, 277-287. Kulikowski J. .I. and Bishop P. 0. (1981) Fourier analysis

J. math. Psychol. 3, 65-98.

Atkinson J., Braddick 0. and French J. (1980) Infant with age. Vision Res. 20, 891-893. Barlow H. B. (1972) Single units and sensation: A neuron doctrine for perceptual psychology? Perception 1, astigmatism: Its disappearance

37 l-394. Blakemore C. and Campbell F. W. (1969) On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. J. Physiol. 203, 237-260.

Blakemore C. and Cooper G. (1970) Development of the brain depends on the visual environment. Nature 228, 477-478. Bracewell R. (I 965) The Fourier Transform and its Applications. McGraw-Hill, New York. Caelli T., Brettel H.. Rentschler I. and Hilz R. (1983) Discrimination thresholds in the two-dimensional spatial frequency domain. Vision Res. 23, 129-133.

Spatial

visual channels

and spatial representation in the visual cortex. Experienriu 37, 160-163. Kulikowski J. J. and King-Smith P. (1973) Spatial arrangement of line, edge, and grating detectors revealed by subthreshold summation. Vision Rex 13, 1455-1478. Legge G. E. (1978) Space domain properties of a spatial frequency channel in human vision. Vision Res. 18. 959-969: Legge G. E. and Foley J. M. (1980) Contrast masking in human vision. 1. ODI. Sot. Am. 70. 1458-1471. Lunneburg R. (I 947) Marhemakal’ Theory of Binocular Vision. Princeton Univ. Press. MacKay D. (1981) Strife over visual cortical function. Nature 289, 1I I- I 18. Maffei L. and Fiorentini A. (1972) Processes of synthesis in visual perception. Nature 240, 479-481. Maffei L. and Fiorentini A. (1973) The visual cortex as a spatial frequency analyser. Vision Res. 13, 1255-1267. Marcelja S. (1980) Mathematical description of the responses of simple cortical cells. J. opt. Sot. Am. 70, 1297-1300. Marr D. (1982) Vision. Freeman, San Francisco. Mostafavi H. and Sakrison D. J. (1976) Structure and properties of a single channel in the human visual system. Vision Res. 16, 957-968. Pollen D. A., Lee J. R. and Taylor J. H. (1971) How does the striate cortex begin the reconstruction of the visual world? Science 173, 74-77. Pollen D. A. and Ronner S. F. (1982) Spatial computation performed by simple and complex cells in the visual cortex of the cat. Vision Res. 22, 101-118. Pollen D. A. and Taylor J. H. (I 974) The striate cortex and the spatial analysis of visual space. In The Neurosciences, Third Study Program (Edited by Schmidt F. 0. and Worden F. C.). DD. 239-247. MIT Press. Cambridae. Regan D. (1982) Visual information channeling in n&nal and disordered vision. Psychol. Reo. 89, 407-444. Sachs M. B., Nachmias J. and Robson J. G. (1971) Spatial frequency channels in human vision. J. opl. Sot. Am. 61, 11761186. Sakitt B. and Barlow H. B. (1982) A model for the economical encoding of the visual image in cerebral cortex. Biol. Cyberner. 43, 97-108. Schwartz E. (1977) Afferent geometry in the primate visual cortex and the generation of neuronal trigger features. Biol. Cybernet. 28, I-14. Sekuler R. (1974) Spatial vision. Ann. Reo. Psychol. 25, 195-232. Switkes E., Mayer M. and Sloan J. (1978) Spatial frequency analysis of the visual environment: Anisotropy and the carpentered environment hypothesis. Vision Res. 18, 1393-1399. Watson A. B. (1982) Summation of grating patches indicates many types of detector at one retinal location. Vision Rex 22, 17-25. Weisstein N., Harris C., Berbaum K., Tangney J. and Williams A. (1977) Contrast reduction by small localized stimuli: extensive spatial spread of above-threshold orientation-selective masking: Vision Res. 17, 341-350. Wetherill G. and Levitt H. (1964) Sequential estimation of points on a psychometric function. Br. J. math. slat. Psychol. 18, I-10. Wilson H. R. (I 979) A four mechanism model for threshold spatial vision. Vision Res. 19, 19-32. Wright M. J. (1982) Contrast sensitivity and adaptation as a function of grating length. Vision Res. 22, 139-149. I.

.

I

The two-dimensional Fourier plane If f(x,y) is a bivariate function, such as the luminance distribution of an image or the weighting function of a 2D filter over (xy) visual space, then its 2D Fourier transform xes

is defined

plane

909

as

where (n,c) are the Cartesian coordinates of the Fourier plane. The 2D Fourier representation is complete. perrnitting reconstruction of the space domain function /(.x,y) by inverse 2D Fourier transformation. The planar coordinates (u,u) can be interpreted in polar wave-vector form such that r is spatial frequency and 0 is orientation, in the polar coordinates 7 r = J u-+c? 9 = arctan(c/u) In general, although an image f(.x,~) function, its 2D transform F(u,c) is a whose real and imaginary parts together tude and phase of each of the oriented ponents present in the image. A single represented in the space domain as f(x,y)

= C + JZZ?

is always a real complex function specify the amplisinusoidal comsinewave grating

sin(u.r + cy + 4)

= C + (A + iB)e-?XtiU”+?“) + (A _ iB)e-?““-“‘-“‘, is represented in the 2D Fourier delta functions

domain

by three weighted

F(u,o) = (A + Bi) 6(u,c) + (A - Bi) 6( -u. which are related

to the space domain spatial

frequency

= arctan

contrast mean

--L.) + Cb(O.0)

properties

(r/u)

= (JXZ?)/C

luminance

= C

phase = arctan APPENDIX

as follows:

= m

orientation

.

APPENDIX

YR

in the Fourier

(A /B).

II

Calculation of joint enlropy for three

theoreticalfilter classes

Case (i). 20 Gabor jilters. The elliptical Gaussian envelopes of the 2D Gabor filters and their 2D Fourier transforms may be characterized by the area contained within the l/e amplitude contour in each domain. In the space domain this contour is described (in the foregoing equations) by n(x’a:

+ y’b’) =

if the filter is centered on the origin and in the frequency domain by

I of spatial

coordinates,

which is centered both at (u,,,u,,) and at (-~a, -co) because the filter function is real and therefore has a two-sided transform. The space domain I/e ellipse has major and Each of minor half-axes I/o,f R and l/b,,/%, respectively. the two frequency domain I/e ellipses has major and minor half-axes b/,/n and a/,f R, respectively. Since the area of an ellipse is x times the product of its major and minor half-axes, we see that the occupied space-domain area of a 2D Gabor filter is I/o6 and its total occupied frequency domain area is 2ab. Thus the product of these two occupied areas for a 2D Gabor filter is always simply 2. Cure (ii) V2G@ers. The extensive, multi-level model of visual processes developed by Marr (1982) envisions an array of Gaussian-smoothed Laplacian filters J(,(x,Y) = V’G,(&Y) spanning a range of sizes determined by the space constant of the Gaussian G&y). As in the foregoing example, the analysis of joint entropy is independent of scale because of

the 1D Similar+ Theorem. and hence the charactenstic vvtll be the same for 311 F’G filters of ali sizes. The analysis is further simplified b) the isotropy of the lifters, allowing descrtption simply in terms of the radial variables p=\ X2-- y- in space and r = ~ U: + r1 tn spatial frequency. Thus we have in the space domain

(using in the last step the definition of the Laplacian operator in polar coordinates). Since the space domain filter function, h(p) = e-Tp’, transforms in the frequency domain into H(r) = e-“‘, we know from the 2D differentiation theorem that the above filter f(p) will transform into F(r) = 4n2rze-““. This frequency domain function has a peak value of 4.62 (when r = 0.564). and it is above its l/e amplitude contour [f(t) > 1.71 within an annulus whose outer radius is given by r = I.0 and whose inner radius is given by r =0.225. The area contained within this filter annulus is n(l’- 0.225’) = 3.0. In the suace domain an appropriate measure of the occupied area off@) is given by the locus of the filter’s minimum, which occurs at p = 4%. The area contained within this circle is 2.0. Therefore the joint occupied area product of V?G filters in the two 2D domains is 6.0. Case (iii). ideal 2D bandpassfillem. One type of ideal 2D bandpass filter would respond uniformly and exclusively within some circular disk centered on a particular spatial

irequency (rOJand orientation (ii,). with a conJugdts @tik .ti (r,,.@,,* R). while rejecting signals rn all other regions 01’ the Fourier plane. Suppose that the disks have rad~tis il: then the total area occupied in the Fourier plane IS 5~‘. in the 2D space domain. the profile of this idrai bandpass filter exists within an isotropic envelope of the form ,f@) = J,(Zrrup)/p, where I, is a Bessel function of the first-kind-and-first-order (see Bracewell. 1965. pp. 53 and 248). The first minimum in/“(p) occurs at p = 5. I s(7nrJ), and so by the same analysis as in the preceding case we may evaluate the occupied space domain area as zrp’ = 26;(4naz). Thus the product of occupied areas in both 2D domains of this ideal 2D bandpass filter is 13. If the filter occupied a square region in the Fourier plane instead of a circular disk. the joint occupied area product (as calculated from sine functions) is 16. The same results would arise, obviousiy, if the space and frequency domains of these filters were reversed. It is important to note. in all of these examples. the principle of the irrelevance of scale parameters in the calculation of joint resolution of the filters in the two 2D domains. These brief geometrical approximations using real-valued filter functions in Appendix II give results only reptesentative of the exact evaluation of the 2D integrals for the proper complex functions. These estimated joint entropy products have been increased by a common factor of 2 through the use of real-valued functions, since they have two-sided transforms.