Spatial organization of nonlinear interactions in form perception

Spatial organization of nonlinear interactions in form perception

Vision Res. Vol. 31,No. 9, pp. 1457-1488, 1991 Printed in Great Britain. All rights reserved W42-6989/91 $3.00 + 0.00 Copyright 0 1991 Pcrgamon press...

4MB Sizes 0 Downloads 34 Views

Vision Res. Vol. 31,No. 9, pp. 1457-1488, 1991 Printed in Great Britain. All rights reserved

W42-6989/91 $3.00 + 0.00 Copyright 0 1991 Pcrgamon press plc

SPATIAL ORGANIZATION OF NONLINEAR INTERACTIONS IN FORM PERCEPTION* JONATHAN D. VICTOR and MARY

M. CON-IX

Department of Neurology and Neuroscience, Cornell University Medical College, 1300 York Avenue and Laboratory of Biophysics, The Rockefeller University, 1230 York Avenue, New York, NY 10021, U.S.A. (Received

7 February

1990; in revised form

I1 January

1991)

Abstract-We examined the perception of structure in a family of visual textures whose second-order correlation structure is flat. These textures were generated by two-dimensional recursion rules, in a manner which extends the construction of Julesz, Gilbert and Victor (1978; Biofogical Cybernetics, 31, 137-140). Textures generated by some recursion rules elicited a visually salient percept of structure, while textures generated by other recursion rules did not. Textures whose statistical structure was visually salient produced evoked responses which differed from the response evoked by completely random textures. The size of this VEP difference correlated well with psychophysical measures. Since the textures were constructed to have identical global spatial frequency spectra, models for the extraction of visual structure must be essentially nonlinear. Models based on symmetry, information content, or simple spatial extent (but not pattern) of correlation fail to explain the observed results. Models based on the cooperative interaction of pairs of nonlinear subunits provide a reasonable qualitative account of the findings. The critical model features are (i) the presence of multiple nonlinear subunits, and (ii) a second nonlinearity, such as a threshold, at the stage of combination of subunit signals. Human

Visual textures

Modeling

Nonlinear interactions

INTRODUCTION

The goal of this study is a better understanding

of neural computations underlying form analysis. Although many neurons of visual cortex may be classified and described by essentially linear techniques (Hubel & Wiesel, 1968; De Valois, Albrecht & Thorell, 1978) there is much evidence that form analysis is a highly nonlinear process (Movshon, Thompson & Tolhurst, 1978b; Morrone, Burr & Maffei, 1982; Von der Heydt, Peterhans & Baumgartner, 1984; Desimone, Schein, Moran & Ungerleider, 1985). In a formal sense, nonlinearities signal feature extraction; without nonlinearities, feature extraction and spatial filtering are indistinguishable (Victor, 1988). From a systems-analytic point of view, nonlinearities may serve as clues to the internal structure of a transduction. Identification of nonlinearities, and separation of linear from nonlinear phenomena, requires a sufficiently rich stimulus set. For this study, we have chosen to use several classes of “isodipole textures” (Jules2 et al., 1978). These

*A portion of this work was presented at the 1989 Meeting of the Association for Research in Vision and Ophthnlmology, Sarasota, Florida.

Visual evoked potentials

are broadband stimuli which contain a variety of local features. One analytical advantage of these stimuli is that isodipole texture pairs (considered as ensembles) have identical spatial frequency spectra. Because of this property, mechanisms which distinguish the stimuli from each other must involve highly nonlinear processes (Victor, 1985; Victor & Zemon, 1985). While (or perhaps because) these textures are unlikely to be encountered in the ordinary visual environment, the mechanisms that they tap must be driven by more familiar stimuli (such as bars and gratings) as well. However, these more familiar stimuli often do not readily allow for a qualitative separation of linear and nonlinear neural mechanisms. We will explore psychophysical and VEP responses to several classes of isodipole textures, with the goal of analyzing the neural computations underlying early stages of form perception. Previous studies (Victor, 1985; Victor & Zemon, 1985; Victor & Conte, 1989) have allowed us to set lower bounds on the complexity of an adequate model. As we will see, the current studies suggest a model architecture, and simple models within this architecture account for the main features of the data. The critical feature of the architecture is the presence

1457

1458

JONAI-HAN D. VICTORand MARYM. CONTE

of TWOnonlinear stages: a local nonlinear subunit, and a long-range cooperative interaction. This stands in contrast to most current models for receptive fields of simple and complex cells (Movshon et al., 1978a, b; Spitzer & Hochstein, 1985a, b; Camarda, Peterhans & Bishop, 1985; Emerson, Citron, Vaughn & Klein, 1987; Jones & Palmer, 1987; Szulborski 8; Palmer, 1990), which feature at most a single nonlinear stage. METHODS

Visual stimuli

All of the visual stimuli used in this study are constructed out of pairs of isodipole textures. Two textures are said to be “isodipole” (Julesz et al., 1978) if their first-order and second-order statistics agree. The first-order statistic is simply mean luminance; the second-order statistic is the autocorrelation function. Despite the apparently restrictive nature of this defining condition, a number of algorithms exist for generation of isodipole textures. The first stage of the construction uses Gilbert’s (1980) algorithm to generate many classes of highly-ordered isodipole textures. The second stage of the construction is an extension of that of Diaconis and Freedman (1981), which allows for graded levels of structure within each texture class. The reader is reminded that the motivation for the use of statistically complex visual stimuli is to separate nonlinear processes from spatial filtering and to make inferences as to the nature of the nonlinearity. As pointed out in the Discussion, our ability to resolve the presence of two nonlinear stages would be lost if we restricted our stimuli to bars, gratings, or other traditional stimuli. Classes of isodipole textures. Gilbert’s (1980) algorithm is essentially a generalization of the algorithm of Julesz et al. (1978) used to generate the basic “even” texture used in our previous work. The algorithm assigns a state of + 1 or - 1 to each check of a lattice in a recursive fashion. The state assigned to the check in row i and column j will be denoted by ai,j. For the original algorithm of Julesz et al., the recursion rule is: Ui,j=Ui-I,j’Ui,j-I’Ui_,,,_,.

(1)

This rule requires that there are an even number of checks whose state ai,j = 1 within any 2 x 2region of the texture. The rule (1) may be written more conveniently as: (2)

where the set G contains the coordinates of four points in a standard 2 x 2 windowpane: G = {(O,O), (0, l), (l,O), (1, 1)). Following Gilbert (1980), we generalize this rule by replacing the 2 x 2-region G by regions of other shapes. We will call a region G which generates a texture according to equation (2) a glider. A texture created with the generalized recursion rule (2) will have an even number of checks whose state a,, = 1 within any region whose shape is that of the glider G. The gliders G we have used are illustrated in Table 1, and the textures they generate are shown in Fig. 1. A texture created with a glider G will be denoted T G’

There are two technical points worth mentioning. (i) The recursion rule (2) only sul?lces to generate textures once the states of some of the checks have been initialized in a random, uncorrelated fashion. It might at first seem that the statistics of the texture that is generated

Table I. Construction of selected isodipole textures. The coordinates of the Slider are used in the recursion rule (2). For the Slider sets and diagrams, the first coordinate (x) indicates the row (from bottom to top), and the second coordinate (y) indicates the column (from left to right) Texture type

Glider coordinates (row, column)

Glider shape : X

. . x l X.

{(0,1),(1,0),(1.2),(2,1)~

: .

: :: X.

Zigzag

{(O,O),(O,l),(l,l),(1,2)}

: X

:: : X.

Oblong

{CO,O), (0.2). (1.0). (1,2)}

l X

: .

: X

.

.

.

Triangle

{CO,l), (1, O), (1, I)}

x .

x. x.

Tee

{(O,O),(O, 1),(0,2),(1,

: X

: : XX

WYe

{(0,2),(1,0),(1,1),(2,l)j

: .

:: : .X

Foot

{(0,0),(0,2),(1,1),(L2)~

: x

: .

El

{CO,2),(1,0),(1,1),(L2)1

: .

: : .X

Standard

{(O,O), (0, l), (l,O),(l,

Cross

1)}

1))

: x

Organization

1459

of form perception

might depend on the choice of the initial set. It is also a priori possible that the statistics of the texture might depend on the manner in which the recursion rule is applied (e.g. row first, or column first). However, Gilbert (1980) has shown that neither of these potentially complicating issues can arise; the statistics of a texture generated by equation (2) depend only on the glider G, and not on the initial set or the direction of propagation. In practice, we have found it convenient to initialize one or more rows and one or more columns, and to generate the texture row by row. The number of rows and columns which need to be initialized depends on the texture class to be generated. (ii) For textures generated by n-element gliders, one anticipates an n th-order correlation between

checks, corresponding to the rule (2). However, for some regions G, correlations of lower order may also be nonzero (for example, a glider consisting of several consecutive checks in one row). Gilbert (1980) provided a means to determine which gliders G lead to no correlations at order less than n; we have only used gliders which conform to his criteria. Thus, the textures described in Table 1 and illustrated in Fig. 1 all have first- and second-order statistics which are identical to that of a random coloring, and the textures generated with four-element gliders also have third-order statistics which are identical to that of a random coloring. Decor-relation. In the second stage of the construction, we introduce spatial decorrelation into the textures TG. As in the previous study

Group I

Zigzag Standard

Oblong

Group II

Dark Triangle

Group III

Foot

Fig. 1. Examples of visual stimuli used in this study. These textures arc fully-correlated textures T&l, 1) based on the gliders G of Table 1.

1460

JONATHAN

D. VICTORand MARYM. CONTE

(Victor & Conte, 1989), we consider two kinds of disorder: propagated disorder, parametrized by fprop; and sporadic disorder, parametrized by To introduce disorder which is propagated throughout the texture, the deterministic recursion rule (2) is replaced by a stochastic one:

Sporadic decorrelation is one of the processes discussed by Diaconis and Freedman (1981). In describing the correlation structure of these textures, we have found it useful to introduce the correlation parameters cProPand c,,,,,, (Victor & Conte, 1989). These quantities are related to the error rates cPropand tsporby: Cprop= 1 - 2$mp

(5)

cEpor= (1 - 2c,po,)4.

(6)

and Errors may also be introduced in a purely local (“sporadic”) fashion. Let us denote the appearance of a check in row i and column j by A,.,j, where A i,j = + 1 denotes a bright check, and Ai,j = - 1 denotes a dark check. Thus, the contrast of the check in row i and column j is equal to Ai,j multiplied by an overall contrast value. The simplest way to translate states u,,~ into appearances Ai,j is that the appearance of a check is faithfully determined by its state, i.e. A,j = u,,~.Sporadic errors may be introduced by nonfaithful rendition of the states ai,j into appearances Ai,j, with the rate of this nonfaithful rendition determined by cspor: prob{A,,j = ui,,} = 1 - tspor.

‘SPORADIC

a = I .OO

a = 0.44

(4)

A texture constructed from error parameters +,r and csporand glider G will be denoted by TG(cprop,cqpor), where cProPand C_ are given by equations (5) and (6). In this notation, the textures illustrated in Fig. 1 are textures T&l, I), for the gliders G listed in Table 1. The texture Tstatidard( 1, 1) is the even texture of Julesz et al. (1978). Several sporadicallydecorrelated textures T,,,(l, a) are illustrated in Fig. 2. Note that the textures TG(cProP,0) and T&O, cwpor)are statistically identical to the completely random texture T(0,0) for any glider G, and for any value of cpropand crPr.

DECORRELATION

a = 0.75

a E 0.00

Fig. 2. Examples of sporadically-decorrelated textures TIlrus(1, a) based on the zigzag glider of Table 1 (1,O) is fully decorrelated (i.e. completely random). The texture TuwpB

Organization of form perception

Correlation statistics. We next state the correlation statistics of the textures TG(cproP,cspor). To do this, we need to consider superpositions of the glider G and translations of itself. Following Gilbert (1980), we introduce a generating function S(x, y) associated with a set of checks S = {(I,, J,)}:

1461

The notion of divisibility permits a concise statement of the correlation properties of the textures TG($,p, +,por). For checks positioned at a set of coordinates S which is not divisible by G, all correlations are equal to zero. For checks positioned at a set of coordinates S which is divisible by G, correlations are given by:

S(x,y) =

1 x’*y’*. (7) (I,.J,)inG The generating function has several useful properties. Translation of a set S by one unit along the x-axis corresponds to multiplication of S by x, and translation along the y-axis corresponds to multiplication of S by y. Superposition of sets S, and S, corresponds to addition of their generating functions S, and $. It will be convenient to con:;der generating functions as algebraic expressions in x and y which can be multiplied and added with all the usual rules for multiplication of polynomials, with the coefficients interpreted modulo 2 (i.e. in the finite field of two elements (0, l}). Multiplication of two generating functions S, and $ now has the following interpretation: the product S = S, $ is the generating function of a set S whose members (Z,J) can be written in an odd number of ways as a vector sum of a member (Z, , J,) of S, and a member (I,, J2) of S, . Multiplication of generating functions thus corresponds to convolution (mod 2) of sets. We say that the set S is “divisible” by the glider G if there exists a finite quotient set Q whose generating function & satisfies: S(X,Y) = G(x,y).&(x,y)

(mod 2).

(8)

If S is divisible by G and Q is the smallest set satisfying equation (8), then we will write Q = S/G. The number of elements in Q will be denoted by n(Q) = n(S/G). Here is an example. The generating function S=(l+x~(l+y~=l+x~+yJ+x’yJcorresponds to the four corners of a rectangle of dimension (I + 1) x (J + 1). The generating function G=(l+x)(l+y)=l+x+y+xy corresponds to the glider for the standard even texture. In this case, S is divisible by G and the quotient polynomial is given by: $ =(l +x + **. +x1-‘) x (1 +y + *** +y’-‘).

(9)

& is thus seen to contain Z.Zterms xiyj, one corresponding to each of the checks in an (I - 1) x (J - 1) rectangle (edges and interior). So in this case, n(S/G) = ZJ.

These results may be verified by the techniques of Appendix 1 of Victor and Conte (1989), and Theorem 2 of Gilbert (1980) for fully-ordered textures Tc(l, I). The above analysis guarantees that for textures TG(cProP,cspor), all correlations among n(G) - 1 or fewer checks are zero. (In the language of Victor (1988), these textures are iso-[R*, n(G) - l] for all receptive field profiles R*}. It also provides a means for construction of distinct textures TG(1, cwp) and TG(cprOp,1) for which correlations across checks within the glider set G are identical. For one of these textures [ TG(cpmp,l)], long-range correlations diminish to zero; for the other [TG(l, cvpor)], long-range correlations do not diminish to zero. This analysis also shows that there is a qualitative difference between textures constructed on gliders of size 4 (e.g. the standard texture) and gliders of size 3 (e.g. the triangle textures). For textures whose generating glider has an even number of elements, contrast-inversion merely provides a new example of the same texture class. But for textures whose generating glider has an odd number of elements, contrastinversion leads to a statistically distinct texture. To see this, observe that contrast-inversion of a texture is equivalent to choosing cspor= 1: the states of all checks are inverted. For textures with n(G) even, nonzero correlations exist on a set S only if n(S) is even as well (Gilbert, 1980). Hence, n(S) must be even in equation (lo), and clpor= 1 leads to the same correlation structure as does crpor= 0. However, for textures with n(G) odd, some sets S with an odd number of elements (for example, S = G) do lead to nonzero correlations. For these sets, contrast-inversion inverts the sign of the correlation. There is a technical comment concerning the definition of the sporadic correlation cIPr (equation 6). We will be comparing psychophysical and electrophysiological responses to textures with n(G) =4 and n(G) = 3, and we will use a = csPr as the independent variable for

1462

JONATHAN D. NCTOR and MARYM. Corn

many of these comparisons. For textures with n(G) = 4 or n(G) = 3, csporindicates the strength of correlations among elements of sets S of size 4. However, for the textures with n(G) = 3, there are additional correlations of strength (1 - 26,,)’ = a314among elements of sets S of size 3 [i.e. S = G in equation (lo)]. Such thirdorder correlations are not present in textures with n(G) = 4. General experimental procedures. The stimuli described above were produced on a Tektronix 608 monitor. Control signals for the raster display [horizontal (X), vertical (Y), and intensity (Z)] were generated by specialized electronics (Milkman, Schick, Rossetto, Ratliff, Shapley 8z Victor, 1980) interfaced to a DEC 1l/73 computer. The display subtended 8.8 deg and had a luminance of 150 cd/m’. Stimuli had a contrast [(I,, - Zk.)/(Z,, + Zmin)]of 0.4, and were viewed binocularly at a distance of 57 cm. The subject pool for VEP and psychophysical studies consisted of six normal volunteers, two male and four female; four of these subjects were naive to the purpose of the experiments. Subject CI is a protanope as determined by Ishihara and Farnsworth-Munsell loo-Hue (Farnsworth, 1957) tests. Subjects ranged in age from 22 to 40 yr, and were corrected to normal visual acuity if necessary. Psychophysical methoa’s. After a subjectpaced warning sound, a stimulus would appear from an equiluminous uniform background for a period of 50 msec. This stimulus had a 50% probability of being a random stimulus T(O,O), and a 50% probability of being a structured stimulus T&l, c,~~). Then, an uncorrelated random mask T(0,0) of equal contrast and luminance would appear for 500 msec. The observer’s forced-choice task was to distinguish the structured stimulus from a random one. For each stimulus type listed in Table 1, the independent variable was the level of sporadic Trials were presented in decorrelation +. blocks of 20 of a particular stimulus type and value of cIpor. Before data collection, observers were given practice (with feedback) with freeview and timed presentations of examples of the target textures and random textures. Practice was allowed until performance stabilized; no feedback was given during data collection. A set of 20 blocks (two repeats of each of 10 values of cm, for a particular glider G) were presented in random order once performance had stabilized. In one testing session, four or five sets of 20-stimulus blocks were presented. For each

subject, each glider was tested on more than one day, so that at least four blocks (80 trials) were collected for most stimulus types and values of cspor. However, in order to minimize collection of redundant data, only two or three blocks were collected for stimuli for which performance was perfect at a lower level of correlation. Measurement

of

visual evoked

potentials.

Visual evoked potentials (VEPs) were elicited by periodic abrupt interchange between examples of two texture classes. The stimulus cycle lasted 474 msec. In the first half of each stimulus cycle (237 msec), an example of a particular structured texture TG(cprOp,cWpor) was presented. In the second half of each stimulus cycle, an example of a random texture T(O,O) was presented. In each successive stimulus cycle, new examples of the structured texture TG(cprop,c,,) and the random texture T(0,0) were presented. The VEP was averaged over this stimulus cycle, so that it represented an averaged response to many examples of each texture class. At each stimulus transition, each check of the texture had a 50% probability of reversing contrast. There was no correlation between the positions of the checks that reversed in successive stimulus transitions. Each raw waveform represents an average of responses recorded in two or three one-minute runs. These runs were presented in randomized blocks. Scalp signals were recorded differentially at C,-O,, P,-O,, C,-O,, C,-Q, amplified lO,OOOfold, and bandpass-filtered (0.03-100 Hz) prior to digitization. Artifact rejection was performed by the computer prior to averaging. VEPs were saved by the 1l/73 computer for offline analysis, which included Fourier analysis, modeling, and factor analysis. RESULTS

Psychophysics

Inspection of the isodipole textures (Fig. 1) revealed that for some textures, the underlying structure was readily apparent (e.g. the standard texture). For other textures, the underlying structure was not readily discernable (e.g. the wye texture). We first quantify the psychophysical salience in the structured textures by a forced-choice psychophysical paradigm, and then introduce a parametrization of the psychophysical curves to allow comparison of performance within and across subjects. Psychometric functions. Stimuli were presented for 50 msec, followed by a random mask. The

Organization of form perception

subject was required to discriminate between examples of a sporadically-decorrelated structured texture and examples of random textures. For each texture, the task was made progressively more difficult by reducing the degree of local correlation cVpor = a from 1 to 0. Results for two subjects are shown in Fig. 3. For some textures, discriminability was essentially perfect for a = 1 (no decorrelation), and remained good as a decreased. These textures included four of the four-element glider textures, as well as the light and dark polarities of the triangle texture. Below, we will refer to the four-check glider textures for which structure is readily apparent as “Group I,” and the three-check glider textures for which structure

1463

is readily apparent as “Group II.” That is, Group I consists of the standard, cross, zigzag, and oblong textures, and Group II consists of the bright-triangle and dark-triangle textures. rapid discriminability For these textures, remained above chance even when local correlation is small (a = 0.1). For the other textures, discriminability was imperfect even for a = 1. Discriminability deteriorated further as decorrelation was introduced (a c l), usually becoming close to chance for a < 0.3. These textures (tee, wye, foot, and el textures) will be referred to as “Group III” textures. Extraction of psychophysical parameters. In order to compare performance within and

A. Group I textures

i;

0.8

2 5

0.7

g :: t

O-O standard Q0 Cro8S A-A zigzag P-V oblong

0.6 05.

_--_____--______--__--____--_--~_---~~~~---_

0.0

0:l

0:2

Ok

014

0:5

0:s

0:7

018

0:9

1:0

1 .o

0.9 0.8 0.7 0.6 0.5

_______________________________~~~~~~~~~~~~~~~

::I.. ’ 0.0

0.1

0.2

0.3

0.4

0.5

0.6

Local Correlation (a) Fig. 3(A). Caption on p. 1465.

0.7

0.8

0.9

1.0

1464

JONATHAN D. VICTOR and MARYM. CONE

across subjects, we used a simple psychophysical model to parametrize these curves. This psychophysical model postulates that there are N independent detectors available to detect the presence of structure. At the beginning of the stimulus presentation, all detectors are in an “initial”, or undecided, state. During stimulus presentation, detectors independently convert into a “set” state, At the end of the stimulus pre~ntation, if the number of detectors in a “set” state exceeds a criterion N,, then the subject performs correctly. If not, then the subject guesses, with a 50% probability of being correct. We postulate that the rate at which detectors convert is the product of two factors: the local correlation a and a rate constant r.

The three parameters of the model (the rate constant r, the critical number of detectors N,, and the number of detectors N) are allowed to depend on the glider type, G. This psychophysical model is analyzed in detail in Victor and Conte (1990). An important result is that the model behavior depends primarily on the critical number of detectors, N,, and their population rate of conversion, R = Nr. The absolute number of detectors N has little influence on model behavior. Parameters N, and R of the model were determined for the psychophysical performance of each of six subjects for each of the 10 texture types Examples of fits of the model for N, = 2 to experimental data are shown as the smooth

B. Group II textures

O-0

0.7

*--*dark

bright triangle triangle

0.6

0.0

0.3 0.0

0.1

I 0.1

0.2

0.2

0.3

0.4

0.5

0.6

0.3

0.4

0.5

0.6

0.7

0.7

Local Correlation (a) Fig, 3(B).

Caprion on facing page.

0.8

0.8

0.9

0.9

1.0

i 1.0

Organization

of form perception

curves in Fig. 3. As seen in the figure, quite reasonable fits were obtained for Group I and Group II textures. The fits for data from Group III textures were not as good, but this is not surprising given the greater internal variability of these data (Fig. 3C). Over all groups of textures and subjects, model and measured performance typically differed by an r.m.s. of 0.03. This is a good fit, considering that the uncertainty in the estimate of psychophysical performance as assessed in 80 trials is on the order of 0.1.

1465

Overall, the choice N, = 2 provided the best fits, as judged by the r.m.s. deviation between predicted and measured performance. For some subjects under some conditions, higher values or lower values of N, provided a smaller value of the r.m.s. deviation. However, the r.m.s. improvement provided by N, = 1 or N, k 4 was typically only 0.01. We do not mean to imply by this discussion that the model itself is correct or that N E= 2 has a particular physiologic significance. Rather, we use the model to provide a single parameter R = R(G) (determined by a

C. Group III textures S:MC

1.0 0.9 r % g

0.8 t

O-0 tee m--IWyO A-Afoot r--r01

0.4 -0.37 0.0

v

0.1

0.2

I

0.3

0.4

I

0.5

0.6

0.7

0.8

0.9

I 1.0

1.0 0.9 0.8 0.7 0.6 __------_____

0.5 0.4 t 0.34 0.0

0.1

0.2

0.3

0.4

0.5

Local Correlation

0.6

0.7

0.8

0.9

4 1.0

(a)

Fig. 3(C). Fig. 3. Frequency of correct identification of structured isodipole textures in a forced-choice psychophysical task, as a function of local correlation a. The smooth curves represent the model fit (Victor 8t Conte, 1990) with N, = 2. Part A: Group I textures. Part B: Group II textures. Part C: Group III textures. Subjects: MC, RB.

1466

JONATHAN D.VICTOR and MARYM. C~NTE

least-squares method with N, = 2) to compare performance within and across subjects. Values of the rate constant R(G) computed in this fashion are shown in Table 2. Although there are variations of more than a factor of two in the size of the rate constant R across subjects, there are a number of important consistencies. Rate constants for Group I and Group II textures were greater than those for Group III textures, usually by a factor of four or more. Generally, rate constants for Group II textures exceeded those for Group I textures, but there was overlap for three subjects and the average rate constant for Group II textures was only approx. 35% higher than for Group I textures. Within groups, rank orders of rate constants were similar across subjects. In Group I, highest rate constants were found for the “standard” or “zigzag” textures in five of six subjects, and lowest rate constants for the “cross” or “oblong” textures in all six subjects. In Group III, although performance was poor, all subjects showed the same rank-order for rate constants: tee > foot > el > wye. In two of the subjects, the best-fitting value of the rate constant R was zero for the “wye” texture. This corresponds to psychophysical performance at chance levels even for the fully-correlated texture (a = 1). However, there was no consistent difference in rate constants in Group II textures: rate constants were faster for “bright triangles”

in two subjects, faster for “dark triangles” in three subjects, and nearly identical in one subject. Across subjects, the geometric mean of the rate constants for Group II textures differed by only 7%. Evoked potentials

We now turn to analysis of VEPs elicited by interchange of the various structured textures with random textures. First we illustrate general features of this VEP and its dependence on stimulus type and local correlation a. Then, we focus separately on its temporal and spatial aspects. General description. VEPs elicited by interchange between structured isodipole textures and random textures elicited similar waveforms at all derivations examined (C-Q,

P,-Q,

C,-o,,

Generally, responses at P,-O, were approximately half the amplitude of the response at C,-O,, and there were no consistent differences between the midline derivation C,-O, and lateral derivations C,4J and C,-Q . Responses for one subject at the midline derivation C,-O, are shown in Fig. 4. Responses contained a prominent occiput-positive (downward) deflection approx. 100 msec after each transition. For some texture stimuli, such as the standard texture, there is a prolonged occiput-negative (upward) deflection following the transition

Table 2. Rate constants R(G) (set-‘) which provide best fits to observed psychophysical data with N, = 2. The parameter R(G) represents the population rate at which hypothetical “detectors” respond to the ieodipole texture G, and is equal to the rate I at which an individual detector responds, times the number of detectors N. For further details, see text. In calculating geometric means, values of zero were excluded Texture type

Subject

Geometric mean

JV

MC

SU

GG

CI

RB

409 213 234 240 264

142 148 153 108 137

212 118 205 169 172

204 123 244 109 161

161 110 139 96 124

228 232 216 215 223

212 151 201 147 175

251 298 273

223 195 206

246 353 295

243 158 196

193 192 193

326 227 272

244 228 236

55

71

95

84

86

109

52 38 32

53 27 34

62 61 71

52 26 38

44 33 50

25 48

81 16 56 33 39

Group I

Standard Cross Zigzag Oblong Group mean

Group II

Bright triangle Dark triangle Group mean Group III

Tee Wye Foot El Group mean

Cz-0,).

Organization of farm perception

Group I

a = 0.44

Group III

a = 0.73

Fig. 4. VEPSelicited by interchange between sporadically&correlated isodipolc textures and thr random texture, as a function of local correlation a. Potentials which are positive at Gg relative to C, am plotted below the isopotential line. Reqooses to interchanges of examples of the random texture am illustrated as the “a = 0” response to the standard texture in Group f. in each record, the two traces represent the average of separate f-min runs. Subject: JV.

1467

1468

JONATHAND.

VICTORand

to the structured texture, which is absent following the transition to the random texture. This feature is seen in responses to textures of Groups I and II but not for responses to the textures of Group III. We analyze VEP responses to isodipole interchange by decomposing them into two components: a “symmetric” component and an “antisymmetric” component (Victor, 1985; Victor & Zemon, 1985). The symmetric component is the sum of the even harmonics of the VEP waveform. Since the symmetric component is effectively the average of the response to a structured texture and the random texture, it contains responses of neural elements sensitive to local luminance and contrast changes. It is thus analogous to the traditional P-100 (Regan, 1989). As seen in Fig. 5A, the symmetric component is generally similar in voltage and waveform for all of the textures studied. The symmetric component may contain patterndependent responses as well; these will be separated below. The antisymmetric component is the sum of the odd harmonics of the VEP waveform. It is effectively the difference between the response to the onset of the structured texture and the response to onset of the random texture. Since structured and random textures are equated for luminance, contrast, and global spatial frequency content, the antisymmetric component contains responses of neural elements sensitive to more complex aspects of local pattern (Victor, 1985). As seen in Fig. 5B, antisymmetric components are large for the textures of Groups I and II, but small to undetectable for the textures of Group III. Within each group, there is variation in the size of the antisymmetric component driven by different structured textures. To a first approximation, these correspond to relative psychophysical distinguishability. For the subject illustrated in Fig. 5B, the largest Group I response was to the standard texture; this texture also yielded the largest rate constant (Table 2). Similarly, justdetectable antisymmetric components were present for all Group III textures except the “wye” texture; the “wye” texture also was the most difficult one to detect (smallest value of R) psychophysically. In addition to overall amplitude differences between antisymmetric components elicited by the textures, there is a suggestion that the time-course of the response is different as well. For example, the antisymmetric component

MARY M. C~NTE

driven by the standard texture is not simply a sinusoid; it has a “nose” on its ascending limb. This nose is not as prominent in the antisymmetric components driven by the other Group I textures, or by the triangle textures. To look for additional evidence that the pattern-dependent VEP response varies in waveform as well as amplitude, we subjected the symmetric component to a further analysis. If the symmetric component depended merely on local luminance and contrast changes, then an identical symmetric component would be elicited by interchange of any of the isodipole textures, including interchange among examples of the random texture. Thus, any patterndependence within the symmetric component elicited by structure texture/random texture interchange may be isolated by subtracting the symmetric component elicited by random texture/random texture interchange. The results of this calculation, which we will call the “pattern-dependent symmetric subcomponent,” are shown in Fig. 5C. For Groups I and III textures, there is little detectable patterndependence revealed by this technique. However, data from both Group II textures show a clear dependence on pattern, as manifest by a symmetric subcomponent of approx. 3 /.LV. The presence of a pattern-dependent symmetric subcomponent for Group II textures, but not for Group I textures, is evidence that the pattern-dependence of VEP responses is not simply one of amplitude changes. Note that there would be nothing to gain by applying this subtraction technique to the antisymmetric component, since the antisymmetric response to interchange of random textures is essentially zero (Fig. 5B). VEP responses: overall pattern-dependent amplitudes. The above analysis reveals that the

pattern-dependent VEP responses to a range of textures differ not just in overall amplitude, but also in dynamics. However, before we pursue a more detailed analysis of the dynamics, we first summarize the overall size of the patterndependent VEP response by the amplitude of the first harmonic, the main contributor to the antisymmetric component. Previously (Victor & Conte, 1989), we showed that the first harmonic amplitude was approximately linearly-related to the local correlation a and that phase was relatively independent of a. This relationship held for responses to the more general isodipole textures examined here, althoueh between local _____ --cI__ the ~~ orooortionalitv ‘ 1

Organization

A. Symmetric Component

S: JV

Group I standard

Cl-059

of form perception

1469

JONATHAN D. VICTOR and MARY M. CONTE

1470

B. Antisymmetric

Component

Group I oblong

a3=i3

a = 0.44

a=O.OO

L-

EM-4

20 pv 250 msec

Group II

a = 0.75

Group III tee

a-‘_+

G

6

FL-_

a=O.75+

+

+

+

Fig. 5(B). Caption

on opposite page.

Organization of form perception C. Pallemdependent

Group

Symmetric

Subcomponent

II

bright trtar@e

dark-e

a = 1.00 +w a=O.V+

+-

a=O.44+

I---_

Group III

Fig. S(C) Fig. 5. Symmetric and antisymmetric components of the VEP elicited by interchange betwctn structured isodipole textures and the random texture, Part A: symmetric component. Part Bz antisymmetric component. Part C: patterndependent symmetric subcomponent. Waveforms are derived from the data of Fig. 4. Responses to interchanges of examples of the random texture are illustrated as the “a = 0” response to the standard texture in Group I.

VR JLp-s

1471

1472

JONATHAN

D. VICTORand MARYM.

correlation and response amplitude depended strongly on the texture type. In Table 3, we have summarized the dependence of the antisymmetric VEP component on local correlation a and texture type G by the slope of the regression of first-harmonic amplitude on a, which we will denote K(G). In general, the assumption of a strict proportionality between a and response amplitude, coupled with the hypothesis of constant response phase, provided fits to the observed first-harmonic amplitudes with a mean-squared error of 0.3 ,uV (range 0.1-0.9 /JV across all 10 textures and six subjects). For comparison, the run-to-run reproducibility of the antisymmetric response itself was typically 0.5 pV. As seen in Table 3, the values of K(G) found for Groups I and II textures averaged about three times those found for Group III textures. This is in accord with the psychophysical findings as summarized in Table 2, although the difference is somewhat more

CONTE

marked in the psychophysical responses. There is no consistent difference between Groups I and II textures, which corresponds to the substantial overlap of performance psychophysically. Rank-order of values of K(G) within groups corresponds to the psychophysical data of Table 2 in a general fashion. Within Group I, largest values of K(G) are found typically for the standard texture; smallest values are found typically for the cross texture. Within Group III, largest values of K(G) are found typically for the tee and foot textures, and smallest values are found typically for the wye texture. However, there are also some consistent differences between the VEP and psychophysical results. For example, the “oblong” texture elicited the second-largest antisymmetric VEP response among Group I textures, but it was the hardest to detect psychophysically [smallest R(G)]. Other than the superiority of

Table 3. The dependence of the antisymmetric VEP component on local correlation a and texture type G. Response amplitudes are summarized by the best-fitting proportionality K(G)between the size of the first harmonic and the local correlation a, with units of pV/unit correlation. Response phases are in units of 7~rad. The geometric mean is used for K(G)and the arithmetic mean is used for phase Subject

Texture type Group I Standard Cross Zigzag Oblong Group mean

-___ CI

RB

Mean

5.23 1.06 2.76 0.98 2.71 1.07 2.80 1.06

3.91 1.10 1.88 1.04 2.37 1.10 2.25 1.08

4.50 1.10 2.25 0.99 2.61 1.05 2.80 1.04

2.07 0.99

3.23 1.04

2.50 1.08

2.93 1.05

3.34 0.99 3.72 0.95

1.81 0.85 2.47 0.82

2.64 0.93 3.80 0.93

1.72 1.00 2.53 1.01

2.64 0.95 3.20 0.94

2.96 0.80

3.52 0.97

2.11 0.84

3.17 0.93

2.09 1.01

2.91 0.94

1.21 0.87 0.51 0.83 1.19 1.11 1.08 0.80

0.96 0.57 0.58 0.92 1.21 0.58 0.71 0.40

1.54 0.84 1.38 0.80 1.03 0.82 1.04 0.79

1.51 0.74 0.71 0.88 0.74 0.74 0.85 0.74

0.84 0.84 0.48 0.47 0.89 1.00 0.60 0.94

1.10 0.89 0.53 0.88 0.41 0.93 0.25 1.17

1.16 0.79 0.64 0.80 0.86 0.86 0.68 0.81

0.94 0.90

0.83 0.62

1.22 0.82

0.91 0.78

0.68 0.82

0.49 0.97

0.81 0.82

JV

MC.-

SU

GG

K(G) phase K(G) phase K(G) phase K(G) phase

6.15 1.16 3.37 1.08 3.96 1.18 4.92 1.11

4.90 1.13 2.26 0.85 2.12 0.92 2.64 0.95

4.39 1.10 2.17 1.02 2.28 1.08 3.07 1.08

3.05 1.04 1.52 0.97 2.05 0.98 1.94 0.98

K(G) phase

4.48 1.13

2.99 0.96

2.86 1.07

K(G)

phase

4.35 1.04 3.99 1.14

2.86 0.90 3.06 0.79

K(G) phase

4.17 1.09

K(G) phase

Group 11

Bright triangle

phase Dark triangle Group mean Group III Tee

K(G)

Wye

K(G) phase

Foot

K(G) phase K(G)

El Group mean

K(G) phase

1473

Organization of form perception

the standard texture to the other three by both VEP and psychophysical criteria, there was no consistent correlation between the rank-order of VEP and psychophysical responses to Group I textures. The VEP data provide information on antisymmetric response dynamics (phase) in addition to overall size [K(G)]. Examination of phase (Table 3) reveals a consistent difference between responses to the three groups of textures. Average response phase was 1.05 x radians for Group I textures, 0.94 IE radians for Group II textures, and 0.82 7c radians for Group III textures. Although there was some variation between subjects, there was little or no overlap of phases between groups in any given subject. These differences of 0.1 l-O.23 II radians within subjects but between groups are large in comparison with the reproducibility of response phases measured on separate occasions (e.g. this experiment and the “propagated decorrelation” experiment, or the data of Victor & Conte, 1989), which was typically 0.04 K rad. The effective latency difference for 2.12 Hz response corresponds to a 25 msec lag for Group II textures compared to Group I textures, and a 54 msec lag for Group III textures compared to Group I textures. Differences in dynamics of responses to the various textures will be further investigated in the next section.

Fourier component of the pattern-dependent response R(G). We now attempt to represent the responses R(G) as linear combinations of a small number H of “principal components” Ph :

(12) h=l

The weights wh are normalized by: ;

b’h(G)12 = 1.

(13)

If the representation (12) succeeds for H = 1, then all pattern-dependent responses R(G) can be regarded as scalar multiples of a common response waveform P,.However, if there is a systematic dependence of dynamics on pattern type, then at least two principal components P, and P2 will be required for the representation (12). The weights w,(G) and w*(G) will then indicate the contribution of each principal component P, and P, to the pattern-dependent response R(G). In general, we found that two principal components (H = 2) were required for a good fit to the data. A test of the significance of the second principal component is its consistency across subjects, which we will examine below. Unfortunately, even with H fixed at the value 2, the representation (12) is not unique (Cattel, 1978). The representation (12) determines the VEF responses: principal components analysis. H-dimensional subspace which most nearly Inspection of the antisymmetric component contains (in a least-square sense) the responses and the pattern-dependent symmetric subcomR(G). This best-fitting subspace is unique. ponent of the VEP suggested that responses to However, the choice of basis vectors w,,(G) distinct isodipole textures differed in dynamics, within that subspace is not uniquely determined. as well as in overall size. In order to determine A change of basis vectors w,,(G) induces a whether the dependence of VEP dynamics on linear transformation among the principal compattern type was systematic, we pursued a pOnentS Ph . Thus, the partidar VdUeS W,,(G) principal components analysis (Cattel, 1978) and the particular WaVefOrmS Ph Cannot be of the pattern-dependent VEP waveforms. The regarded as having a direct interpretation. For pattern-dependent VEP response to the texture this reason, we do not further analyze the T,(l, 1) is the sum of the antisymmetric comindividual values wh(G), but instead we analyze ponent (e.g. Fig. 5B) and the pattern-dependent algebraic “invariants” constructed from them symmetric subcomponent (e.g. Fig. 5C). To (Gutowitz et al., 1986). These invariants begin the principal components analysis, we I(G, G’) are functions of two texture types (G represent this response R(G) by a vector of and G’), and are defined by: its Fourier components. In order to improve signal-to-noise, we consider only the first eight I(G, G’) = w1(G)w,(G’) - w2(G)wl (G’). (14) Fourier components (Gutowitz, Zemon, Victor & Knight, 1986). Thus: The determinant-like nature of I(G, G’) guarantees that its value is independent of a change B(G) = [R,(G), R,(G), . . .>&WI (11) in basis vectors, up to a common scale factor where &(G) is a complex number which (Gutowitz et al,, 1986). The invariants have an represents the amplitude and phase of the nth intuitive interpretation: its magnitude expresses

JONATHAN D. VICTORand MARYM. CQNTE A.

S: JV Pattern Type

1 - standard

1

8.

2

3

4

5

I

6

I

7

6

.

.

9

10

2 3 4 5 6 7 8 9 10

= = = * -

cross zigzag oblong bright triangle dark triangle tee wye foot et

All Subjects

. . u .. .. .. . . 1

2

3

Group 1

4

Pa&m

Type

5 I

8 I

Group II

7

..c.. ..

8

9

10

Group Ill

Fig. 6. Contour maps of the invariants f(G, G’). The height of the surface corresponding to a pair of textures G and G’ indicates the extent to which they represent distinct linear combinations of the first two principal components. Each contour line represents 0.1 unit, and tickmarks point downhill. Part A: invariants calculated from data of Fig. 5. Part B: average of invariants across all subjects.

1475

Organization of form perception

the extent to which the responses to textures G and G’ fail to be scalar multiples of each other. In Fig. 6A, we show a contour map of the invariant Z(G, G’) derived from the patterndependent VEP components of Fig. 5. The diagonal symmetry of the plot is due to the identity Z(G, G’) = -Z(G’, G). The largest absolute value of Z(G, G’) occurs when G is the bright triangle texture and G’ is the zigzag texture. Z(G, G’) is essentially zero when either one of the two textures is from Group III. This is simply a consequence of the fact that these textures produce only very small pattemdependent responses, and consequently their weights wh are close to zero. A similar set of invariants was constructed for the other five subjects. To avoid biasing the average in favor of those subjects whose overall VEP amplitudes were large, data from individual subjects were weighted by the reciprocal of the overall pattern-dependent amplitude. Figure 6B shows the cross-subject average of the invariants. The major features seen in the individual subject (Fig. 6A) again appear. The largest values of the invariant are seen when one of the textures is a Group I texture and the other is a Group II texture, and very small values are found when either of the texture is a Group III texture. Since the invariant Z(G, G’) indicates the difference between the dynamics of responses to textures G and G’, we conclude that there are significant differences in dynamics between Groups I and II textures. As seen from the height of the invariant plots, these differences are two to three times as large as differences within textures of Group 1, and four or more times as large as differences within textures of Group II. We now turn to an examination of the consistency of the first two principal components across subjects. For any pair of subjects s, and ~2, we begin with the representations in terms of principal components: k,(G)

= 2 wh,s,(G)Ph.J, h-l

We form the dot-product of weights from each pair of principal components, Dh,, hl(s, , sz):

For any pair of subjects, the matrix of dotproducts D,,,, represents the extent to which the weights w,,,,~,(G) for principal component h, from subject s, correspond to the weights w,,~,_(G) for principal component hz from subject s2. If the weights correspond exactly, then D,,,,,,*= 1. If there is no correlation among the weights, then D,, ,h2= 0. As pointed out above, the particular values of the weights w,,~, depend on the (arbitrary) choice of linear combinations for the principal components Z’,,(G). Thus, for two or more principal components, values of the individual weights (and hence of Dh,,J cannot be interpreted directly. However, we again circumvent this problem by the construction of an invariant which is independent of linear transformations among the two principal components. This invariant is the determinant of the matrix of dot-products D,,,,,,,(s,, s2). It will be denoted Xt21(s,,sz), and is given by:

-D,,~(~,,~z).D~,,(~,,~z).

(17)

Analogous cross-invariants Xtnl(sl, s2) may be constructed from any number n of principal components extracted from two subjects, as the determinant of the matrix of dot-products Dh,,h2(~1 , s2). For a single principal component, the corresponding invariant Xt*l(s, , s2) is simply the dot-product D,, , (s, , s2) itself. This procedure has a geometric interpretation. A set of weights w,,(G) may be regarded as a unit vector in a response space, which expresses how heavily principal component h contributes to responses to each of the textures. Let us first consider the case in which there is only one significant principal component [H = 1 in equation (12)]. If the weights corresponding to the first principal components from two subjects s, and s2 are similar, then the vector w,,,!(G) and the vector w,,,,(G) will point in similar directions. The quantity Xt’r(sr , s2) = D,,., (s, , s2) is the cosine of the angle between the weight vector w,,,,(G) and the weight vector w,,,,(G). It thus expresses the similarity between the first principal components in subjects s, and s2. When there is more than one significant principal component, the situation is more complex. The representation (12) identifies an H-dimensional subspace which contains the responses R(G).The invariant (17) is the cosine of the dihedral angle between the response

1476

JONATHAN

D. VICTORand MARYM. Gxrr

Table 4. Invariants Xt’l(s,, s2) and Xt2](s,, s2) for all pairs of subjects (s,, s2).Values of Xt”lclose to 1 indicate close correspondence between the weights of the n-term principal components representations (15) for the two subjects. Significant correspondences at P = 0.05 are indicated by *, and at P = 0.01 are indicated by ** Subject s2

JV

MC

0.946** 0.835**

Subject s, SU GG

MC

Xl’] Xl21

su

XU1 Xl21

0.987** 0.462 0.954+* 0.640*

GG

Xt’l X121

0.934** 0.429

0.922” 0.716’

0.981** 0.794++

CI

XIII Xl21

0.970’* 0.624’

0.984** 0.881’.

0.941** 0.841**

0.943** 0.769*+

X”]

0.974+* 0.569

0.988** 0.653*

0.957** 0.830**

0.953+* 0.752**

RB

$21

subspaces for subjects S, and s2. Thus, it is equal to 1 when the subspaces coincide, and equal to zero when they are perpendicular. We note that although the subjects’ response subspaces are uniquely determined by the representation (12), the particular weights are not uniquely determined. However, the value of the invariant (17) is independent of the choice of weights, since it is an intrinsic property of the subspaces themselves. Values of the invariants X[‘](s,, s2) and Xtzl(sl, s2) for all pairs of subjects are displayed in Table 4. The invariants Xl’] range from 0.922 to 0.994. Values close to 1 correspond to close correspondence across subjects; values close to 0 indicate a lack of correspondence. The statistical significance of the calculated values for A?‘]can be determined from the distribution of values that would be obtained for random weight vectors w, s and w,,~~. By this measure, all 15 values are significant at P = 0.01 (critical value for Xl’l is 0.701) Values of the invariahts A’“]range from 0.429 to 0.881. We consider two measures of statistical significance. The naive measure is derived from the distribution of values of A’[‘] that would be obtained from random weight vectors w~,~,, and w2,z2(subject to the condition w2,,,9 Wl.q, that weights for each subject are orthonormal). By this measure, all 15 values of x”l are significant at P = 0.01 (critical value for A’[‘]is 0.420). This indicates that the response planes for any pair of subjects are more aligned than would be expected by chance alone. However, it does not really indicate that the second principal component is significant across subjects, since the nonrandom alignment of response planes may simply be due to the alignment of the first principal component. To address the signifi-

CI

RB

0.994** 0.837**

cance of the second component per se, we calculated another distribution of values of X121. This distribution assumed that the first pair of weights w~,~,and wlS12are equal, and the second pair w~,~,w2,q are random. It thus allows for testing of the incremental significance of the second principal component. By this measure, eight of the 15 values of Xt21are significant at P = 0.01 (critical value for A’[‘]is 0.730), and 12 of the 15 values of X12]are significant at P = 0.05 (critical value for Xm is 0.616). This more stringent criterion is the one used for indicating significance in Table 4, and will be the criterion used subsequently. In an analogous fashion, cross-subject consistency between the first three principal components would be manifest by a non-random distribution of the third invariant XR, which is defined as a 3 x 3-determinant similar to equation (17). To assess incremental significance of the third principal component, we calculated a distribution of values of X13]assuming that the first two pairs of weights w,,~, and wh,J2 (h = 1 and 2) determine the same plane, and the third pair w3,,, w3,J2are random. This analysis revealed no evidence of significance of the third invariant XA: one value out of 15 was significant at P = 0.05, with a critical value for Xpl of 0.651. The analysis described above was also performed for the antisymmetric component alone (e.g. Fig. 5B), and the pattern-dependent symmetric subcomponent alone (e.g. Fig. SC). For the antisymmetric component, cross-subject invariants X[‘l ranged from 0.920 to 0.990, and all were significant at P = 0.01. The crosssubject invariants X(‘l ranged from 0.271 to 0.866. By the more stringent test of significance described above, 6 of the 15 values were

Organization

1417

of form perception

significant at P = 0.01 and 10 were significant at P = 0.05. For the symmetric subcomponent, crosssubject invariants Xt’l ranged from 0.028 to 0.944. Eight of the 15 values were significant at P = 0.01 and twelve were significant at P = 0.05. The cross-subject invariants Xm ranged from 0.022 to 0.688. By the more stringent test of significance, none of the 15 values were significant at P = 0.01 and only two were significant at P = 0.05. Thus, we conclude that for these textures, the pattern-sensitive VEP’s are described by a two-parameter family of responses (12). Across subjects, the near-unity values of the invariants Xl11and Xrzl (Table 4) indicate a similarity in weights w, and w2 with which the two principal components combine to produce a response. Both principal components are manifest in the antisymmetric component of the response, but only one principal component is present in the symmetric subcomponent. In a qualitative sense, the requirement for two principal components, rather than a single component, is the difference between responses to Groups I and II textures. This is seen from the peak of the invariant Z(G, G’) when one texture is from Group I and the other is from Group II (Fig. 6). VEP responses: propagated and sporadic decorrelation. The results described so far were obtained from fully-correlated textures TG(1, l),

or from textures with sporadic decorrelation T&l, a). We now compare these responses to VEPs elicited by textures TG(a, 1) with propagated decorrelation. The purpose of this comparison is to estimate the spatial scale of the interactions that generate the pattern-sensitive response. The estimate is based on the relationship between the correlation properties of the textures TG( 1, a) and T,(a, 1). Both textures have the same degree of local correlation a. Textures TG( 1, a) with sporadic decorrelation only (cprop= 1, crpor= a < 1) have preserved long-range correlation structure. Textures T&a, 1) with propagated decorrelation only 1) have no long-range cor(cprop= a
range, then responses to a texture T,(l, a) with preserved long-range correlation should be substantially larger than the responses to a texture T&a, 1) whose long-range correlations diminish rapidly with distance. As described in detail in Victor and Conte (1989), a quantitative comparison of responses obtained with sporadic and propagated decorrelation leads to an estimate of the distribution and overall spatial scale D of nonlinear interactions which generate the VEP. For three subjects (IV, MC, RB) and three textures (standard, oblong, and bright triangles), we measured antisymmetric VEP components at five levels of sporadic and propagated decorrelation. In all cases, as local correlation a decreased to zero, responses decreased more rapidly for textures T&a, 1) with propagated decorrelation than for textures T,(l, a) with sporadic decorrelation. We applied the method of Victor and Conte (1989) to interpret this finding in terms of the form and spatial scale of long-range interactions. For all textures, a one-dimensional exponential weighting of spatial interactions provided a better fit than did two-dimensional exponential weightings, or one- and two-dimensional Gaussian weightings, or uniform weightings. The spatial scale D of the best-fitting weightings is shown in Table 5. Although a threefold range of spatial scales D is found across subjects and across texture types, there are no striking trends. The variation of estimated scale D across subjects is of a similar magnitude as the variation across texture types within a subject. This variability does not represent experimental uncertainty: estimates of spatial scale recorded on separate occasions over one year apart agreed within 8% (values in parentheses in Table 5). DISCUSSION

We begin by summarizing the present experimental results in the context of our previous Table 5. The spatial scale D (min) of interactions for three isodipole textures. D is calculated from the comparison of responses to textures T,(l, u) with sporadic decorrelation with responses to textures T&cl, 1) with propagated decorrelation, with 4min checks. Values in parentheses were obtained in a previous study (Victor dr Conte, 1989), and are included to illustrate reproducibility Texture type

JV

Standard

(E)

Oblong Bright triangle

11.5 14.5

MC

RB

13.4

13.9 12.9 7.9

1478

JONATI-LW D.

VICTORand MARY M. CONTE

studies. On qualitative grounds, this will permit of models for our endings, and will suggest an architecture for models which might succeed. We then present one simple example of such a model, and discuss its salient features. US to exclude a variety

In previous studies (Victor, 1985; Victor & firnon, 1985), we showed that alternation between two isodipole textures: the even texture be, Tnpndprd (1, l)] and the odd texture [here, Tsuadnrd( - I, 1)] generated a robust VEP. The statistics of the isodipole textures implied that this VFP and the salient perceptual differences between these textures were generated by a highly nonlinear neural mechanism. Such a mechanism must have a formal order at least four, and must include nonlinear interactions between four distinct points in space. In principle, summation of neural signals followed by a nonlinearity of high order (such as rectification) possesses the requisite complexity. However, quantitative analysis excluded this relatively simple mechanism (Victor 1985), and, by implication, mechanisms composed of linear superpositions of such subunits. In later studies (Victor & Conte, 1989), we explored VEP and psychophysical responses to the standard texture constructed with a range of check sizes and degrees of local and global correlation structure. This allowed us to test additional models within the linear~nonlinear framework, in which the linear stage was more elaborate than simple spatial pooling. Models with these more sophisticated linear stages (such as an elongated bar, edge-detector, or center-surround profile) could not account for several qualitative features of the responses to the standard even [TsUtiard(1, l)] and odd [ Taundmi(- 1, l)] textures. However, this analysis did suggest a possible framework for a SUCCESSful model: an array of sufficiently complex nonlinear subunits over a wide area, whose outputs were combined in a manner which was exponentially decreasing with distance. At any position in visual space, the spatial scales of these subunits appeared to vary over a 4-&fold range. Summary of current results

In this study, we examined VEP and psychophysical responses to a variety of texture classes, which shared the isodipole property but differed in the choice of recursion relation, or “‘glider” (Table 1). This allowed us to explore which local

features of the stimulus were responsible for the perception of pattern and the production of a pattern-de~ndent VEP response. The main qualitative finding is that textures based on some gliders (Groups I and II) led to rapid psychophysical detection of pattern (Fig. 3, Table 2) and a large pattern-dependent VEP (Figs 4 and 5, Table 3). Textures based on other gliders (Group III) were more difficult to detect psychophysically and produced much smaller pattern-dependent VEPs. We divided the textures that were readily distinguishable into two groups based on the size of the glider (Group I-4 checks, Group II-3 checks). Although psychophysical detection and VEP amplitudes were similar in both groups, two lines of evidence pointed to physiological differences between responses to these groups. There was a consistent difference in VEP phase, with Group I responses leading Group II responses by about a twentieth of a cycle (Table 3), corresponding to a latency shift of 2.5msec. Principal components analysis showed that VEP responses could be adequately reconstructed as a sum of two waveforms. The requirement for two waveforms was primarily due to differences between Group I responses and Group II responses (Table 4). Other components of the analysis failed to reveal as clear a pattern of consistency across subjects. These included the rank-order of psychophysical and VEP responses within Group I and Group II (Tables 2 and 3), and the spatial scale of interactions as determined from VEP responses to textures with propagated decorrelation (Table 5). Relationship of VEP and psychophysical studies

We have pursued the psychophysical studies in parallel with the VEP studies in order to be sure that the neural interactions measured in the VEPs are relevant to pattern vision. However, psychophysical studies do not provide for as detailed and direct a measure of response dynamics as do the VEP studies (e.g. Tables 3 and 4). Additionally, the VEPs provide a bridge to studies that can only be carried out in the experimental animal. In preliminary studies, (Purpura & Victor, unpublished results) isodipole VEPs which share the main features of the responses presented here have been recorded epicortically over Vl of the macaque. Thus, in modeling the neural circuitry underlying texture responses, we should focus on processes likely to be present in striate cortex.

Organization of form perception

Implications for cortical processing

We now turn to consideration of models for early visual processing that account for these findings. The major result that we seek to account for is that textures based on some gliders lead to immediate perceptual salience and a large VEP, and others do not. Although a detailed analysis reveals subtle differences between Groups I and II responses, the VEP responses and psychometric curves are similar (Figs 3-5). For this reason, we feel justified in attempting to account for the perceptual salience of textures in both classes with a single model, although we will later discuss possible mechanisms underlying the more subtle differences between Groups I and II responses. As discussed above, a successful model to account for discrimination of these isodipole textures must be highly nonlinear. In Victor and Conte (1989), we suggested a framework consisting of an array of sufficiently complex local nonlinearities, whose outputs are combined by a second nonlinear stage. As a preliminary step, we will consider potentially plausible alternative kinds of models whose structure is unrelated to this framework. These models may be excluded on the basis of qualitative features of failure to separate the readily-distinguishable textures (Groups I and II) from those that were not readily-distinguishable (Group III). Then, we turn to a simple, physiologically-plausible example of the two-nonlinearity model, and discuss how our data permit parameters of the model to be refined. A role for information content? Perception of a visual pattern implies perception of redundancy; the existence of a pattern implies that one portion of the visual texture places constraints on a second portion. Thus, textures in which structure is evident may be thought of as having a lower information content (i.e. greater redundancy) as perceived by the visual system. To what extent is the visuallyperceived information content simply related to the usual formal notion (Shannon, 1948) of information content, as applied to the isodipole textures? The Shannon information content of a region R of a visual texture T is given by a sum over all possible allowed configurations ;i of the texture: I]R

Tl = -&(A) i.

logp(I)

(18)

1479

where “log” is the logarithm to the base 2. An example will clarify the relationship of low information content to pattern. For the random texture T(0, 0),all of the 2” ways of coloring the checks of an i x j region are equally likely. Thus, each texture configuration has probability 2-‘j, and Z[R, T(0,0)]= ij log 2. In contrast, for the fully-correlated even texture Tstandard( 1, 1), the configuration of an i x j region is determined by the configuration of the first row and the first column. Thus, there are only 2’+j-’ configurations, each of which are equally likely. It follows that IN

(1, 1)] = (i +j - 1)log 2. Ts,andard

Thus, the information content of a patch of a random texture is proportional to the number of checks it contains, while the information content of a patch of the fully-correlated standard texture is proportional to the number of checks in its first row and first column. For any glider G, the configuration of a region of a fully-correlated texture T&l, 1) is completely determined by the random assignment of check polarities to one or more initial rows and columns. After assignment to a sufficiently large “initial set” (Gilbert, 1980) the interior of the texture is determined. Thus, the information content of a region of the texture is proportional to the minimum number of checks that must be specified. Table 6 lists how many rows and columns must be specified to determine textures based on each kind of glider. As seen from Table 6, the hypothesis that textures with greater redundancy have more salient structure than textures with less redundancy is not supported. The least-redundant texture (oblong) has salient structure, while two of the most redundant textures (tee and el) have relatively inapparent structure. A role for symmetry? The information-theoretic analysis shows that not all spatial correlations are equally relevant to the visual perception of structure, or redundancy. Symmetry is a global feature readily recognized by the visual system. We considered the possibility that symmetry of the glider, and hence of the texture statistics, is the crucial characteristic which separates the textures with salient structure from those without salient structure. Symmetry properties of the gliders are listed in Table 6. Two of the textures with the most salient structure are

JONATHAND.

1480

VICTORand MARYM. CONTE

Table 6. Properties of selected isodipole textures and their gliders. The model responses were calculated from the model of Fig. 7, with parameters N = 6, h = 0.6, r = 0.5, ra = 0.5, rb = 0.5, 0 = 90 deg, as described in the text. The bright and dark triangle textures, each constructed from the same triangle glider, share all properties listed Texture type

Order of glider

Sum of squared distances

Maximum squared distance

8 16 12 20

Symmetry

Information content

Response of proposed model

2 4 5 5

8-fold I-fold 2-fold Cfold

1 row, 1 co1 2 rows 1 row, 1 co1 1 row, 2 cols

0.0348 0.0292 0.0311 0.0297

4

2

2-fold

I row

0.0289

11 16 15 14

4 5 5 5

2-fold 2-fold l-fold 1-fold

1 row 2 rows 1 row, 1 co1 1 row

0.0242 0.0264 0.0279 0.0253

Group I

Standard Cross Zigzag Oblong Group II

Triangle Group III

Tee WYe Foot El

generated by gliders with the 8-fold symmetry of the square (standard and cross), while two of the textures without salient structure are generated by gliders without symmetry (foot and el). However, examination of the remaining gliders shows that symmetry per se is not the crucial feature which determines the percept of structure. Some gliders with two-fold symmetry generate textures with readily-apparent structure (oblong and triangle), while others do not (tee and wye). We also note that it is possible to manipulate symmetry by changing the lattice upon which the texture is constructed. For example, the square lattice may be replaced with a hexagonal one. This reduces the symmetry of the standard and cross gliders from 8 to 4-fold, while it increases the symmetry of the wye glider from 2 to 6-fold. Informal observations show that the qualitative percept of presence or absence of visual structure is not changed by this manipulation. A role for glider size?

Let us consider the possibility that the overall size of the glider (which provides the basic constraint that defined the texture) determines whether a statistical constraint is detected visually. Two measures of the size of a glider are listed in Table 6: the sum of the squares of the distances among all cells of the glider, and the square of the maximum distance between two cells of the glider. As seen in Table 6, neither measure separates the textures with salient structure from the textures without salient structure.

Density of correlations in glider iterates

Since neither glider symmetry nor size determine which textures are visually salient, we are compelled to examine more subtle properties of the textures. Since the textures are generated by repeatedly applying a recursion rule specified by a glider, we turn to properties of these glider iterates. Consider the standard texture. Its defining rule constrains the number of bright checks in a 2 x 2 region to be even. However, since the number of bright checks in an overlapping 2 x 2-region must also be even, it follows that the number of bright checks at the corners of a 2 x 3-region is also even. Thus, iteration of the local recursion rule induces a relationship between the states of checks at distances greater than the extent of the glider. For the even texture, appropriate iteration of the recursion rule shows that the number of bright checks at the corners of an (I + 1) x (J + 1) rectangle is always even. This geometric property corresponds to an algebraic property of generating functions of regions and gliders. As discussed in Methods, the states of checks in a set S are correlated if (and only if) the generating function 3(x, y) (equation 7) corresponding to S is di_visible (mod 2) by the generating function G(x, y) corresponding to the glider G. Thus, the fourmembered sets S whose checks are correlated corresponds to the four-term polynomials 3(x, y) which are divisible by G(x, y). As seen from the example following equation (8), for the standard texture, all polynomials of the form $(x, y) = (1 +x9(1 +y? indeed have Gstandar&,Y > as a factor.

Organization of form perception

Thus, for the standard texture, there are many four-membered sets of checks substantially larger than the glider whose states are correlated. More precisely, the number of fourth-order correlations within a distance D of a particular check is proportional to D2. This property is common to the other Group I textures as well, although the proportionality constant various from texture to texture. One way to see this is to note that correlated sets correspond to polynomials which are divisible by the glider’s generating function (equation lo), and the generating functions of the other Group I textures are algebraically related to estandard : &&,y)=x

+y +x2y = Andd

+xy2

cv, x Iv)

(19)

&&,y)=1+y+xy+X~2 = Lndard (XY,Y) LX&Y)=

(20)

1 +x +y2+xy2 = &aodard(x3 Y2).

(21)

On the other hand, Group III textures have far fewer 4-element sets which are correlated. For these textures, the asymptotic number of sets of size D or less appears to be log D (tee, wye, or el) or log D2 (foot). The logarithmic dependence follows from the fact that for all four-element gliders, polynomials of the form e’“(x, y) contain only four nonzero terms (mod 2), and all are necessarily divisible by 6. For textures in Group II, the asymptotic number of such sets appears to be greater than that of any of the Group III textures. This may be seen from the fact that all polynomials of the form [etrianglc(x,y)lzm have three elements, so that superpositions of the corresponding sets and their translates (with single checks in common) produce many 4-element correlated sets of size D. The difficulty in making these statements rigorous lies not in providing lists of 4-element sets which are correlated, but in proving that the lists are exhaustive. For the tee, foot, and el textures, the asymptotic estimates above may be rigorously proven by tedious counting arguments; for the other textures, no such proof is as yet available. A candidate model

We draw two conclusions from the above analysis: (i) the visual system only detects pat-

1481

tern in textures with a high degree of mathematical redundancy if the spatial organization of the correlation structure is appropriate; and (ii) a model which distinguishes a salient correlation structure from one which is much less visually salient must exploit correlations over a distance substantially greater than the glider itself. Previously (Victor & Conte, 1989), we had concluded that models which contain a local nonlinearity followed by a second nonlinearity of wider spatial extent might account for responses to the standard texture. We now present an example of such a model, and show that it accounts for our current experimental results. Figure 7 illustrates the spatial organization of a computational element of the proposed model. Signal processing begins with linear spatial summation within a circular Gaussian profile of radius r. That is, the output of this component of the model is determined by convolving the sensitivity function

(22) s(x,Y)=~exp[-(x’+y?lrq with the stimulus luminance. In this and all other discussion of the model, length parameters (such as r) are in units of check size, and stimulus contrast is taken to be unity. Note that the Gaussian profile (22) is normalized to unit volume, so that its response to a stimulus of unit contrast must lie between - 1 and 1. We postulate that the output of the Gaussian profiles are combined in antagonistic pairs in a linear fashion. As shown in Fig. 7, the centers of the Gaussians of each pair are separated from each other by a distance 2r,. The result of this summation is half-wave rectified. Thus, each pair of linearly-summing regions constitutes a nonlinear subunit (Hochstein & Shapley 1976) which is a primitive edge detector whose output ranges from 0 to 2. The maximum output occurs when one profile of the pair is entirely within a bright region, and the other profile is entirely within a dark region. These subunits are positioned along a line, separated from each other by a distance r,, and oriented at an angle 0 with respect to this line. The last processing stage consists of arithmetic averaging of responses across N such subunits, followed by a threshold. Since the individual subunit outputs are guaranteed to be nonnegative, this threshold must be set at a value strictly greater than zero to have any effect.

1482

JONATHAN

D. Vrcro~ and MARYM. CONTE

Fig. 7. A diagram of the spatial organization of the proposed model, superimposed on an example of the standard texture. Responses from circularly-symmetric linearly-summing regions are combined in an antagonistic fashion. The net response of each pair of regions is half-wave rectified. The rectified signal is averaged across pairs of regions and subjected to a thresholding operation. r is the radius of a linear-summing region, 2r, is the separation of two summing regions within an antagonistic pair, and rb is the separation of summing regions along the longitudinal axis. The number of pairs of regions is N, the threshold per pair of regions is h, and the angle between the region pairs and the longitudinal axis is 0. In this example, r = 0.4, r, = 0.5, rb = 1.O, N = 6 and 0 = 90 deg.

We denote the threshold for the second nonlinear stage by h. Since subunit responses are averaged prior to this thresholding operation, h may be considered as the threshold per subunit. (We choose to average, rather than add, subunit responses so that the influence on model responses of changing the number of subunits N is not confounded with an effective change of the threshold.) We now examine the dependence of the model response on its defining parameters. To simulate model responses, the computational element described above was placed in at least 10 random orientations and positions on each

of at least 100 examples of the textures under consideration. Typically, these 1000 placements provided Monte Carlo estimates which were stable to within 5%. If greater variability was encountered, or if greater accuracy was required for comparisons, the Monte Carlo estimate was extended to include further placements. The average response over all such placements will be presented here. Since the most important part of the model is the second nonlinear stage, we examine the effect of h first (Fig. 8A). The spatial parameters of the model are taken to be N = 6, r = 0.5, ra = 0.5, rb = 0.5, 0 = 90deg. In comparing

Organization of form perception

responses to the random texture and the standard (even) texture, there are two regimes of behavior. For low values (0.0-0.2) of the threshold per subunit h, the response to the standard texture is less than the response to the random texture, and the difference is small (0.97 : 1). As the threshold parameter h increases beyond 0.4, responses to the standard texture A. N=6, 0=90”

.3 .

.l

B

i“a,

standard texturn mndom taxturn

O-O l -e

‘a

.03.

s

k u

a

.Ol:

:

.003 0

1

‘.

.ool

0.2

0.0

0.4

0.6

1.0

0.5

1.2

h

.3 B. h=0.6, 0=90”

1

O-O

l -e

,il 0

2

4

6

5

10

atondord twtum random tatun

12

14

16

N

.5 C. N=6, h=0.6 o-o e-e

.l

ltondard taxtun random t*xtun

I

Fig. 8. Part A: simulation of model responses to the standard (even) and the random textures, as a function of the threshold parameter h. N = 6, r = 0.5, r, = 0.5, rb = 0.5, 0 = 90 deg. Part B: simulation of model responses to the standard and the random textures, as a function of the number of subunits N. h =0.6, r =0.5, r,=O.5, r,=0.5, B = 90 deg. Part C: simulation of model responses to the standard and the random textures, as a function of the orientation 0. N = 6, h = 0.6, r = 0.5, r, = 0.5, r, = 0.5. (0) response to standard texture; (0) response to random texture.

1483

exceed responses to the random texture by progressively larger ratios. However, the overall response size becomes progressively smaller. Thresholds in the range 0.4-0.8 provide standard: random response ratios of 1.1-I .6, with response attenuations (compared to h = 0) of 73-95%. The low-h regime is unattractive since there is only a slight difference between responses to standard and random textures. We therefore choose h = 0.6 for further exploration. Figure 8B shows model responses as a function of the number of subunits N. The threshold parameter h is kept fixed at 0.6 and the spatial parameters of the model unchanged from Fig. 8A. When N is small (1 or 2), responses to standard and random textures are nearly equal (standard : random response ratio of 1.05 : 1 or less). As N increases, this response ratio grows while response size diminishes. For N = 16, the response ratio is 1.7 : 1, but overall response size has been attenuated by approx. 90%. The tradeoff of selectivity (large standard:random response ratio) for response size is a general feature of this model, and is not restricted to these specific values of N and h. Intuitively, this may be understood as follows. One salient feature of the standard texture is that it contains a larger-than-chance number of extended contours. Because of this, when one subunit is well-stimulated, there is a greater chance that another one along the same line is also well-stimulated. The second nonlinearity (the threshold controlled by the parameter h) results in a cooperative interaction between the subunits: there is a net model response only when several of the subunits are simultaneously active. As h increases, a greater proportion of subunits must be simultaneously active, or the levels of activity in the subunits must be higher. As N increases, the simultaneous activity of subunits must extend over a greater range (since the threshold level is set at Nh). The N = 6, h = 0.6 pair leads to a reasonable selectivity for the standard texture of 1.3 : 1. The overall response size is approx. 30% of the response at N = 1, and apprbx. 10% of the response at h = 0. The reduction in responsiveness due to the second nonlinearity appears comparable to the penalty that cortical units appear to pay for selectivity in other circumstances. For example, the macaque cortical neurons have orientation tuning bandwidths ranging from approx. 5 to 60 deg, and typically have only a very small response to all but 10

1484

JONATHAN

D. VICTORand MARY M. CONTE

to 50% of possible orientations (De Valois, Yund & Helper, 1982). Part C of Fig. 8 examines the effect of the angle 8 between the edge detector axis and the line connecting subunit centers. For values of 8 from 45 to 90deg, there is relatively little effect on the standard:random response ratio. The miniscule response size for values of 15 deg or less make determinations of the response ratio meaningless in this range. Thus, orientations 8 other than 90deg provide little if any increase in the standard : random response ratio, and orientations less than 30 deg produce an unreasonably large reduction in response size. We now examine the response of this model with N = 6 h = 0.6, 8 = 90 deg to the several classes of isodipole textures we have studied experimentally. As shown in the final column of Table 6, responses to the Group I textures are in the range 0.0292-0.0348, while responses to the Group III textures are in the range 0.0242-0.0279, and the response to the Group II textures (independent of polarity) is 0.0289. The response to the random texture (not listed in Table 6) is 0.0283. All Group I textures yield a larger model response than the random texture, and all Group III textures yield a smaller response than the random texture. Within Group I, the response to the cross texture is the smallest, and the response to the standard texture is the largest. The Group II response is larger than the response to a random texture, but smaller than the Group I responses. Thus, this receptive-field model indeed separates the readily-discriminable textures from those that are not.

the range 0.0096-0.0150, responses to the Group III textures in the range 0.0074-0.0086, a response to the Group II textures of 0.0107, and a response to the random texture of 0.0102 with N = 6, h = 0.6. Within groups, the rankorder of response size depended in a complex and sensitive manner on these parameters, except that usually, the standard texture produced the largest Group I response, the cross texture produced the smallest Group I response, and the wye texture produced the smallest Group III response. In view of the variability of rank-ordering across subjects, it is difficult to use this information to refine the spatial structure of the model further. The critically important feature of the model is the presence of two nonlinearities. The first nonlinearity is required so that the model’s response to edges will not cancel each other if the polarity of the edges are opposite. The specific nonlinear function is not important. We have chosen half-wave rectification, but full-wave rectification, squaring, or any nonlinearity with even-order components will suffice to prevent cancellation of responses to opposite-polarity edges. The second nonlinearity is required to generate a cooperative interaction between its local inputs. Removing the effect of the second nonlinearity (either by setting h = 0 or N = 1) eliminates the differential response to the structured textures. The second nonlinearity was chosen to be a threshold set at a level modestly higher than its typical input. This effectively provided a cooperative interaction between its inputs. It is the cooperativity, rather than the threshold itself, which is important for model behavior; other accelerating nonlinearities would serve as Essential features and limitations of the model well. The important spatial feature of the linear In the data presented, we chose only a fixed stage of the model is that it is high-pass, or set of values for the summing radius r and relatively sensitive to edges. We chose a differthe spatial parameters ra and rb of the model. The value r = 0.5 for the summing radius was ence of offset Gaussians as the initial linear stage of the model to provide sensitivity to chosen so that the summing diameter matched the check size, and the spacing parameters were edges in a simple fashion, and to be able to study the importance of the orientation of this chosen so that typical summing circles overfiltering stage to the model axis (0). Other lapped only partially, and typically fell in neighthan this difference, similar results would have boring checks. For values of r, r, and r,, within been obtained for other filters which are higha factor of two or so of these choices, we found difference of similar separation of responses to textures in the pass, such as a center/surround different groups for appropriate choices of N Gaussians. The linear subunits of the model were arrayed along a line because this arrangeand h, although model responses were generally ment provided the best fit to VEP responses to smaller. For example, separating the subunits standard textures with sporadic and propagated by rb = 1.O checks rather than rb = 0.5 checks yielded responses to the Group I textures in decorrelation (Victor & Conte, 1989).

1485

Organization of form perception

Aspects

ofthe

modes which are incomplete

The model as it stands might provide an adequate explanation of responses at a given contrast level, but cannot account for responses over a range of contrasts. This is because the effects of the second non~nea~ty become progressively weaker as contrast increases. Indeed, since the local nonlinearity (summation in circular regions followed by half-wave rectification) has a response which is proportional to stimulus contrast, scaling the contrast up by a factor c is equivalent to scaling the threshold down by a factor c. In descriptive terms, the seiectivity which was conferred by the thresholding operation can be defeated simply by presenting a stimulus with sufficiently high contrast. Contrary to this model prediction, pattern--specific responses to isodipole textures do not decrease (either in absolute or relative size) as contrast increases (Victor & Conte, 1987). This difficulty is not unique to this model, but would be shared by any model which gains selectivity through a cooperative interaction. A reasonable extension of the model which solves this problem is that the threshold is adjusted according to the global contrast level (e.g. through lateral inhibitory interactions that are not specific to stimulus features). We have not attempted to model temporal aspects of processing. It does not suffice to regard the model’s output as a scalar, to which appropriate dynamics which generate the VEP waveform may be attached. Model responses to Group I and Group II textures are Zurger than responses to the random texture, while responses to Group III textures are smaller than responses to the random texture. If the VEP simply represented the output of the spatial model shaped by a temporal filter, one would predict that Group III responses would be opposite in phase to Group I responses, rather than much smaller but of a similar phase, as we have found. A more complex temporal model (perhaps including dynamical components at either or both of the postulated nonlinear stages) is therefore required. We have also not accounted for the subtle but definite difference in the dynamics of responses to different textures. It is unclear whether this difference can be accounted for simply by adding the appropriate dynamics to the proposed model. An alternative and more attractive explanation rests on the statistical properties that distinguish Group II textures from Group I

textures. A Group II texture contains figures of a particular contrast polarity (bright or dark, but not both) on a random-appearing background. Interchange of a Group II texture with the random texture may therefore yield a VEP because of asymmetries between “on” and “OR” pathways (Zemon, Gordon & Welch, 1988). However, Group I and random textures cannot probe on/off asymmetries: inverting the contrast of a Group I or a random texture merely produces another example of the same texture. Thus, the difference between Group I and Group II responses may well be due to the existence of neural mechanisms driven by Group II textures that are not driven by the random or Group I textures. This is suggested by two pieces of evidence, (i) The analysis of the invariant f(G, G’) demonstrate that the need for a second principal component was primarily due to a comparison of Group I with Group II responses. (ii) Model responses for Group II textures are smaller than those for Group I textures, but VEP and psychophysical responses are similar in size. Relationship to cellular information processing

studies

of

cortical

Despite this model’s incompleteness in several respects, it raises basic questions concerning the computations of early vision. Models for simple cell receptive fields are typically linear, or, linear with a single stage of rectification at the stage of spike generation (Movshon et al., 1978a; Camarda et al., 1985; Spitzer & Hochstein, 1985a; Jones & Palmer 1987). The complex cell receptive field has been modelled as multiple nonlinear subunits arrayed on a line, whose outputs are summed linearly (Spitzer & Hochstein, 1985b). We find that a second nonlinearity and a stage of spatial pooling is crncial to account for the difference in response to structured and random textures. As seen in Fig. 8A, removal of the second nonlinearity (setting h = 0) eliminates the difference in response to the structured and random textures. As seen in Fig, 8B, removal of the final stage of spatial pooling (setting N = 1 or N = 2) also eliminates this difference, Thus, if the singlenonlinearity concept of simple and complex receptive field structure is accurate, we can infer that the computations we have been probing take place at a later stage, perhaps by cooperative .combination of simple or complex cell responses.

1486

JONATHAN

D. VICTORand MARYM.

The problem with this line of reasoning is that the presence of the second nonlinearity in the current model only makes a qualitative difference for intrinsically two-dimensional stimuli, such as the textures we have used. The effects of the second nonlinearity would not be very evident if the test stimuli used were onedimensional stimuli, such as gratings and bars, even if presented at many orientations. This can be seen without detailed modeling, as follows. Oriented along the preferred axis of the cell, a bar or grating would excite all the subunits in synchrony, and the two nonlinearities would collapse into a single threshold. Oriented along a nonpreferred direction, such a stimulus would evoke little response from the first stage of the model, and again the effects of the nonlinearity would not be evident. Thus, it is possible that the two-nonlinearity network needed to account for our VEP and psychophysical results is present at the level of the simple and complex cell, but that most studies based on traditional stimuli do not reveal its presence. The possibility of a cooperative interaction among subunits of a complex cell is raised by the work of Movshon et al. (1978b) based on responses to asynchronouslypresented pairs of line segments; our model requires a cooperative interaction between synchronously-presented elements as well. If indeed the second nonlinearity resides early in striate circuitry, then the connections between subunits may be related, to the long-range excitatory and orientation-specific connections in striate cortex (Ts’o, Gilbert & Wiesel, 1986). Relation

to models

of texture

segmentation

Our psychophysical task is discrimination of sequentially-presented homogeneous textures, rather than one of texture segregation. But since early vision mechanisms that are involved in texture discrimination are likely related to those involved in texture segregation, we consider the relationship of our model to models proposed for texture segregation. Beck, Sutter and Ivry (1987) interpreted the cancellation of opposing changes in size and contrast as evidence that texture segmentation is based on the output of spatial frequency channels. The present work, as well as the studies of Julesz and coworkers (Caelli & Julesz, 1978, Caelli, Julesz & Gilbert, 1978; Julesz, 1981), reveals that local features support texture discrimination even under circumstances in which the output of spatial frequency channels

CONTE

are statistically identical. Thus, models based on spatial frequency channels cannot account for our results. Bergen and Adelson (1988) discuss a class of models for texture segmentation based on local energy measures. In these models, local energy is computed by rectification of the output of localized spatial filters. Detection of texture differences is based on the regionally summed outputs of these local filters. Chubb and Landy (1990) elaborate on this model by considering a single nonlinear stage of more arbitrary characteristics. But since these models lack a second nonlinear stage, they are unlikely to explain our findings without further elaboration. Such elaboration may be found in the models of Caelli (1985) and Voorhees and Poggio (1988), who incorporate local energy detectors in a more complex framework which includes long-range nonlinearities. In these models, the primary purpose of the additional nonlinearities is to account for processes such as filling-in and boundary generation, which are considered to follow initial feature extraction. Though these spatial nonlinearities are different from the second nonlinear stage that we propose, it is interesting to note that they (or similar processes) appear necessary to account for aspects of the initial analysis of form as well. Acknowledgements-This work was supported in part by grants EY 1428 and EY7977.

REFERENCES Beck, J., Sutter, A. & Ivry, R. (1987). Spatial frequency channels and perceptual grouping in texture segregation. Computer 299-325.

Vision, Graphics and Image

Processing,

37,

Bergen, J. R. & Adelson, E. H. (1988). Early vision and texture perception Nature, London, 333, 363-364. Caelli, T. (1985). Three. processing characteristics of visual texture segmentation. Spatial Vision, I, 19-30. Caelli, T. M. & Julesz, B. (1978). On perceptual analyzers underlying visual texture discrimination: Part I. Biological Cybernetics, 28, 167-175.

Caelli, T. M., Julesz, B. & Gilbert, E. N. (1978). On perceptual analyzers underlying visual texture discrimination: Part II. Biological Cybernetics, 29, 201-214. Camarda, R. M., Peterhans, E. & Bishop, P. 0. (1985). Spatial organization of subregions in receptive fields of simple cells in cat striate cortex as revealed by stationary flashing bars and moving edges. Experimental Brain Research 60, 136-l 50. Cattel, R. B. (1978). The scientific use of factor analysis. New York: Plenum Press. Chubb, C. & Landy, M. S. (1990). Orthogonal distribution analysis: A new approach to the study of texture percention. In Landy, M. S. & Movshon, J. A. (Eds), _

1487

Organization of form perception Computational models of visual processing. Cambridge, Mass.: MIT Press. Desimone, R., Schein, S. J., Moran J. & Ungerleider, L. G. (1985). Contour, color and shape analysis beyond the striate cortex. Vision Research, 25, 441-452. De Valois, R. L., Albrecht, D. G. & Thorell, L. G. (1978). Cortical cells: bar and edge detectors, or spatial frequency filters? In Cool, S. J. & Smith, E. L. (Eds) Fronfiers in visual science. (pp. 544-556). Berlin: Springer. De Valois, R. L., Yund, E. W. & Helper, N. (1982). The orientation and direction selectivity of cells in macaque visual cortex. Vision Research, 22, 531-544. Diaconis, P. & Freedman, D. (1981). On the statistics of vision: the Julesz conjecture. Journal of Mathematical Psychology, 24, 112-138. Emerson, R. C., Citron, M., Vaughn, W. J. & Klein, S. (1987). Nonlinear directionally selective subunits in complex cells of cat striate cortex. Journal of Neurophysiology, 58, 33-65.

Farnsworth, D. (1957). The Farnsworth-Munsell IOO-Hue test for the examination of color discrimination: Manual. Baltimore, MD: Munsell Color Company. Gilbert, E. N. (1980). Random colorings of a lattice on squares in the plane. SIAM Journal of Algebra and Discrete Metho&, I, 152-159. Gutowitz, H., Zemon, V., Victor, J. D. & Knight B. W. (1986) Source geometry and dynamics of the visual evoked potential. Electroencephalography and Clinical

Journal of Physiology, London, 262, 265-284.

Hubel, D. H. & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, London. 195, 215-243.

Jones, J. P. & Palmer, L. A. (1987). The two-dimensional spatial structure of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58, 1187-l 211. Julesz, B. (1981). Textons, the elements of texture peroeption, and their interactions. Nature, London, 290, 91-97. Julesz, B., Gilbert, E. N. & Victor, J. D. (1978). Visual discrimination of textures with identical third-order statistics. Biological Cybernetics, 31, 137-140. Milkman, N., Schick, G., Rossetto, M., Ratliff, F., Shapley, R. & Victor, J. D. (1980). A two-dimensional computer-controlled visual stimulator. Behavioral Research 12, 283-292.

Morrone, C., Burr, D. C. & Maffei, L. (1982). Functional implications of cross-orientation inhibition of cortical visual cells--I. Neurophysiological evidence. Proceedings of the Royal Society, London B, 216, 335-354.

Movshon, J. A., Thompson, I. D. & Tolhurst, D. J. (1978a). Spatial summation in the receptive fields of simple cells in the cat’s striate cortex. Journal of Physiology, London, 283, 53-77.

Movshon, J. A., Thompson, I. D. & Tolhurst, D. J. (1978b). Receptive field organization of complex cells in the cat’s striate cortex. Journal of Physiology, London 283, 79-99.

Regan, D. (1989). Human brain electrophysiology. New York: Elsevier. Shannon, C. E. (1948). A mathematical theory of communication.

1266-1286.

Szulborski, R. G. & Palmer, L. A. (1990). The twodimensional spatial structure of nonlinear subunits in the receptive fields of complex cells. Virion Research, 30, 249-254. Ts’o, D. Y., Gilbert, C. D. & W&l, T. N. (1986). Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis. Journal of Neuroscience, 6, 1160-1170. Victor, J. D. (1985). Complex visual textures as a tool for studying the VEP. Vision Research, 25, 1811-1827. Victor, J. D. (1988). Models for preattentive texture discrimination: Fourier analysis and local feature processing in a unified framework. Spatial Vision, 3, 263-280. Victor, J. D. & Conte, M. M. (1987). Visual evoked potentials elicited by simple and complex textures: Distinct components with similar scalp topographies. In Barber, C. Jr Blum, T. (Ed@, Evoked potentials III: The Third International Evoked Potentials Symposium (pp. 183-l 89). Boston: Butterworths. Victor, J. D. KcConte, M. M. (1989). Cortical interactions in texture processing: scale and dynamics. Visual Neuroscience, 2, 297-3 13. Victor, J. D. & Conte, M. M. (1990). Motion mechanisms have only limited access to form information. Vision Research, 30, 289-301.

Neurophysiology, 64, 308-327.

Hochstein, S. & Shapley, R. M. (1976). Linear and nonlinear spatial subunits in Y cat retinal ganglion cells.

hfethoo!s and Insuumentation,

Spitzer, H. & Ho&stein, S. (1985b). A complexcell receptive-field model. Journal of Neurophysiology 53,

Bell Systems Technical Journal, 27, 379-423.

Spitzer, H. & Hochstein, S. (1985a). Simple- and complexcell response dependences on stimulation parameters. Journal of Neurophysiology, 53, 1244-l 265.

Victor, J. D. & Zemon, V. (1985). The human visual evoked potential: Analysis of components due to elementary and complex aspects of form. Vision Research, 25, 1829-1842.

Von der Heydt, R., Peterhans, E. & Baumgartner, G. (1984). Illusory contours and cortical neuron responses. Science, 224, 1260-1262. Voorhees, H. & Poggio, T. (1988). Computing texture boundaries from images. Nature, London, 333, 364-367. Zemon, V., Gordon J. & Welch, J. (1988). Asymmetries in ON and OFF visual pathways of humans revealed using contrast-evoked cortical potentials. Visual Neuroscience, I, 145-150.

APPENDIX Asymptotic

Analysis

of Number

of Order-4

Correlations

For any particular glider, correlations of order 4 exist on any set S whose generating function 3(x, y) is divisible by the generating function of the glider 4(x, y). Here we analyze the asymptotic behavior of the number of such sets S, as a function of the distance D spanned by the set, for the gliders of Group I. We content ourselves with estimating the form of the leading term of dependence on the distance D, and omit details needed to calculate lower-order terms and numerical factors. The simplest case to consider is that of the standard glider, which has a generating function. e undud(x.Y)=(l+~)(l+Y).

(Al)

We have seen (example following equation 8) that all polynomials of the form 3(x. y) = (1 + x1)(1 + y’) have e as a factor. We now show that these are the only foz?&n polynomials whose lowest-order term is 1 that have Gsti, as a factor.

JONATHAN D. VICTORand MARY

1488

Assume that &x, y) is a four-term t%a”dUd (x9Y) as a factor. We note that: e nnnd.,&, 1) = 0

polynomial

(mod 2).

with

(A2)

Since P(x, y) = 6,tird (x, y). 0(x, y) (mod 2) for some quotient generating function 0, it follows that: &x, 1) = 0

(mod 2)

(A3)

According to the definition of the generating function (7) eqn (A3) expands to 1 x&=0 (6.h) mP

(mod2)

(A4)

Equation (A4) is an identity between polynomials in x, with coefficients interpreted modulo 2. It can only hold if the coefficients of x’ are all even. This means that the number of checks (l,,J,) in every row is an even number. Thus, any nonempty set P whose generating function is divisible by t? must have at least two columns that contain checks. A similar analysis, starting with erunbrd (1, y) = 0 (mod 2), leads to the conclusion that P must have at least two rows that contain checks. It follows that if P contains only four checks, then these checks must be at the comers of

M. CCJNTE

a rectangle. Assuming that the lower-left comer of this rectangle is at (0,0),then the generating function must be of the form (x’+ l)(y’+ 1). It is now straightforward to count the number of four-element sets within a distance D that have nonzero fourth-order interactions. This is simply the number of rectangles whose dimensions are both less than D, and is proportional to D2. The other Group I textures may be handled in an identical fashion. This follows from the observation that the other Group I gliders are a kind of linear transformation of the standard glider. In terms of generating functions, this corresponds to an algebraic relationship between the other Group I gliders’ generating functions and t?runbrd (equations 19-21). Because of these algebraic relationships, the divisibility arguments that applied to 6,apply to the other generating functions as well, and the number of sets S whose generating functions are divisible by these generating functions also grow in proportion to D2. The proportionality constant depends on the shape of the glider, and the shape of the area of the texture under consideration. For typical regions, the proportionality constant is largest for G,,, and smallest for e,,.