Vision Res. Vol. 33, NO. 516, pp. 609-426,
0042~6989/93 $6.00 + 0.00 Copyright 0 1993 Pergamon Press Ltd
1993
Printed in Great Britain. All rights reserved
Structural Testing of Multi-Input LinearNonlinear Cascade Models for Cells in Macaque Striate Cortex LOWELL D. JACOBSON,*
JAMES P. GASKA,* HAI-WEN CHEN,* DANIEL A. POLLEN*
Received 6 January 1992; in revised form 5 August 1992
Structural testing methods based on experimental white noise stimulus-response data were used to evaluate multi-input linear-nonlinear (LN) cascade models for simple and complex cells in macaque striate cortex. An LN structural test index, based on white noise ~~a~on, was developed and fotmd to be suitable for classifying cells as simple vs complex. In particular, classification results based on the LN structural test index were similar to classification results based on a traditional modulation index derived from cell responses to drifting sinewave gratings. Judging from their stnMural test indices, complex cells deviated more strongly from LN behavior than did simple cells. Yet, even with simple cells, on average, only about 60% of the first- and second-order wbite noise stimulus-response relation was consistent with LN behavior. Just two of thirteen simple cells studied had an LN consistency level that exceeded 80%. Similar results were found in tests for consistency with an LNL model which includes an additional linear post-filter. We conclude that a conventional multi-input LN network mode1may be a useful approximation to the response behavior of some simple cells. However, even during steady state stimulus conditious, subcortical and/or cortical nonlinear&s other than a static output nonlinearity play a very significant role in shaping the responses of most simple cells in the macaque striate cortex. Striate cortex
Simple cell
Complex
cell
Spatial filter
Temporal
filter
White noise
Nonlinear
models
Macaque
1. INTRODUCTION An info~ativ~ means of describing the input~utput (I-O) relationship of visual neurons employs blockstructured network models that consist of dynamic linear (L) and static non-linear (N) elements whose interconnection reflects the structure of the underlying physical network. For example, simple cells in the striate cortex of mammals are often modeled by an LN cascade system-i.e. a linear spatiotemporal filter followed-by a static threshold nonlinearity (Movshon, Thompson & Tolhurst, 1978a; Andrews & Pollen, 1979). Similarly, complex cells are often modeled by a summation of parallel LN channels whose linear spatiotemporal filters differ in spatial phase or position (Movshon, Thompson & Tolhurst, 1978b; Pollen & Ronner, 1983; Spitzer & Hochstein, 1985). The LN cascade model and the parallel LN channel model are respectively shown in Fig. l(a, c). Of course, an LN model offers an over-simpli~ed view of the general photic response properties of simple cells. For example, such a model fails to account for the adaptation of simple cell responses to the prevailing *Department School,
of Neurology, University of Massachusetts Worcester, MA 01655, U.S.A.
Medical 609
mean luminance and contrast of the stimulus. The failure of an LN model to account for well-known adaptational phenomena is acoeptable insofar as the LN model was originally proposed to describe the I-O relationship of simple cells for a limited range of mean luminance and contrast. However, there is a body of research that suggests that even a state-dependent LN model may have serious shortcomings. Much of this evidence comes from experiments which demonstrate that the response of a cell to a near optimal stimulus can be partially suppressed by superimposing additional stimulus components. For example, there are many studies which demonstrate that the response of striate neurons to a stimulus which covers the classic receptive field cam be suppressed when stimuli outside the classic receptive field are also presented to the cell (Maffei & Fiorentini, 1976; Nelson & Frost, 1978; Hammond & Mackay, 1981; De Valois, Thorell & Albrecht, 1985; Gilbert & Wiesel, 1990). In addition to the suppressive effects produced by extending the spatial characteristics of a stimulus there are many studies which use compound gratings to demonstrate suppr~sion when nonoptimal grating frequency or orientation components are added to the stimulus (De Valois & Tootell, 1983; Bonds, 1989). Although these suppressive effects are often named after the
610
LOWELL D. JACOBSON CI al.
0
(4
l l
FIGURE 1. Block-structured nonlinear feedforward models for simple and complex cells. (a) LN structure, (b) LNL structure, and (c) parallel LN structure. Each model has n time-varying inputs, denoted by 2, (z), . , Z,(t). Each input represents the temporal luminance function presented to a single subregion of a visual cortical neuron’s two-dimensional spatial receptive field. The single output from each model represents the temporal response of the neuron. In the present experiments, this is the celi’s time-var~~ rate of spike discharge. Each L-block denotes a linear filtering subsystem and each N-block denotes a static nonlinearity (see text for definition). Nate that the L-blocks can have either one input or many inputs depending on whether or not they directly receive the visual stimulus as input. The LN structure in (a) is shared by many models for simple cells which apply a linear s~tio~rn~r~ filter (L) to the visual stimulus followed by a threshold nonhnearity (N). The parallel LN structure in (c) is shared by many models for complex cells which suppose that multiple LN mechanisms are linearly summr:d to produce a complex cell’s autput.
stimulus that was used to produce them (i.e. side-stopping, surround suppression, cross-frequency suppression or cross-o~entation suppression), it is not clear at present if these effects are produced by distinct mechanisms or are different manifestations of a single relatively nonspecific suppressive m~hanism. The tuning of the suppression is, in general, far broader in space and spatial frequency than the excitatory tuning evoked by a single stimulus. These findings are difficult to incorporate into an LN model because, with the exception of subthreshold stimuli, a stimulus that produces no excitation or inhibition when presented alone cannot influence the response of an LN system when presented in superposition with other stimuli. In addition, some suppressive effects have been shown to be independent of the spatial phase of the grating stimulus used to produce the suppression (Bonds, 1989). Such spatial phase independent suppression cannot be accounted for by an LN model. Dean, ToIhurst and Walker (1982) showed that the amplitude of the response to a high temporal frequency stimulus increased when a low temporal frequency component was added to the stimulus. This result was recently extended by Reid, Victor and Shapley (I992) who used stimuli whose contrast was modulated in time by a single sinusoid or the sum of eight sinusoids. Reid er al. also showed that the phase of the fundamental response of simple cells measured using a single sinusoid differed from the phase response measured using the sum of sinusoids. This data clearly rejects the general LN model because a static nonlinearity can only affect the amplitude and not the phase of the funda~ntal response.
While the studies discussed above demonstrate specific stimulus conditions where the LN model will fail, they cannot be readily generalized to give a measure of overall LN failure of a neuron for a larger range of interacting stimuli which differ in their spatial and temporal positions. In this study we use a spatiotemporal white noise stimulus and a multi-input structural testing technique (Korenberg, 1973a, b; Chen, Jacobson & Gaska, 1990) to quantify the likelihood that the filtering properties of a given neuron in the striate cortex of the macaque monkey could arise from a multi-input LN cascade. The I-O relationship of a nonlinear visual neuron studied with spatiotemporal white noise can be summarized as a Wiener (or Wiener-like) functional expansion of the system output in terms of its input (Wiener, 1958; French, 1976; Marmarelis & Marmarelis, 1978; Klein & Yasui, 1979; Schetzen, 1980). The kernel functions which appear in the expansion are generally estimated by cross-correlating the stimulus and response functions (Lee & Schetzen, 1965). The first-order cross-correlation functions have recently been used by a number of researchers to investigate linear filtering by visual neurons. For example, spatiotemporal white noise and firstorder cross-correlation functions have been used to study color processing in the lateral geniculate nucleus (Reid & Shapley, 1990), the detailed spatial receptive field profiles of cortical simple cells (Jones & Palmer, 1987), and binocular filtering and representation by cortical simple cells (DeAngelis, Ohzawa & Freeman, 1991). Second-order cross-correlation functions have been used to study nonlinear ~ont~butions to motion
MULTI-INPUT
LN CASCADE MODELS
selectivity by cortical simple and complex cells (Emerson, Citron, Vaughn & Klein, 1987; Emerson, Korenberg & Citron, 1992), and binocular filtering by cortical complex cells (Ohzawa, Freeman & DeAngelis, 1990). The random and broadband nature of a white noise stimulus permits one to characterize nonlinear response interactions between many localized stimuli with different receptive field positions and temporal delays. For example, the second-order cross-correlation function, obtained when using two-dimensional white noise stimuli, represents a generalization of the two-bar interaction data obtained by Movshon et al. (1978b). The correct interpretation of first- and second-order kernels depends on the network structure of the cell under study. If a valid block structured model can be established for a cell, then there are many techniques which employ the Wiener kernels of the system as the basis for estimating the parameters of the linear and in the model (Korenberg, elements nonlinear 1973a, b, 1982,1985; Korenberg & Hunter, 1986; Hunter & Korenberg, 1986; Marmarelis & Marmarelis, 1978; Billings & Fakhouri, 1978, 1981, 1982; Billings, 1980; Schetzen, 1986; Chen, Ishii & Suzumura, 1986; Chen et al., 1990). For example, if the nonlinear photic I-O relation of a visual neuron is consistent with a multiinput LN description, then such techniques can be used to estimate the cell’s spatiotemporal pre-filter (L) and static output nonlinearity (N). In this paper, we examine the relationship between the first- and second-order cross-correlation functions obtained from striate neurons in order to derive a single number that characterizes their overall deviation from LN behavior. The testing procedure used in these experiments is insensitive to the parameterization of the linear (L) and static nonlinear (N) components of the LN model. This is analogous to using the principal of superposition to test a system for linearity insofar as no assumptions are required concerning the parameterization of the transfer function of the system under test. The general approach used to test multi-input LN models in the present paper can also be employed to test more complicated block-structured models (Chen et al., 1990). Such methods could therefore lead to the development of a taxonomy of functional cell types (in the striate cortex of primates) that is based on the compatibility of neuronal stimulusresponse characteristics with various block-structured nonlinear models. 2. METHODS 2.1. Surgical and anesthetic procedures Experiments were carried out on nine male macaque monkeys (Macaca fascicularis) weighing 2.5-5.0 kg. On experimental days an animal was first lightly anesthetized and immobilized with ketamine (10-l 5 mg/kg) and an i.v. cannula was inserted into a leg vein. Brevital (methohexital sodium) (3 mg/kg) was then injected for brief but deep general anesthesia. The larynx was
611
sprayed with a local anesthetic (lidocaine), and an endotracheal tube coated with 2% lidocaine gel was inserted. The animal was then allowed to rest and a baseline ECG was taken to determine the resting heart rate. Using a doppler ultrasonic device, the resting systolic blood pressure was measured over a peripheral leg artery. Local anesthetics were then applied at pressure points. The animal was then respired on a mixture of 70% nitrous oxide and 30% oxygen for light anesthesia and analgesia at a rate resulting in pC0, levels of 4.&5.0%. An additional dose of sodium pentobarbitone (1 mg/kg) was given before the animal was placed in the stereotactic apparatus. The animal was not paralyzed at this point nor until contact lenses could be placed and all surgical procedures completed so that any sign of potential distress could be observed and immediately eliminated by i.v. injection of sodium pentobarbitone given in increments of 1 mg/kg as needed. The ECG was monitored and additional pentobarbitone was injected whenever the heart rate significantly exceeded the resting baseline rate. Body temperature was automatically restricted to normal limits by a thermostatically controlled heating pad. The scalp was injected with a long acting local anesthetic (Marcaine) and a scalp incision was then made. A trephine hole was made over the expected location of the lunate sulcus which could sometimes be recognized through the dura. A small durotomy was made to expose the lunate sulcus. A small plastic ring of about 3 mm in height was placed over the trephine hole and attached to the bone with dental acrylic cement. After an electrode was directed towards the cortex, the bottom of the chamber was filled with a previously heated 4% agar in artificial cerebrospinal fluid. The gel was applied when its temperature dropped to 41°C and was then allowed to cool and harden. The gel served both to dampen cortical pulsations and to provide a heat shield when bone wax just above its melting point was poured over the agar to seal the chamber. After surgery was completed, paralysis was initiated using gallamine triethiodide (2.0 mg/kg i.v.) and Pavulon (pancuronium bromide) at a dose of 0.2 mg/kg. Paralysis was then maintained throughout the experiment by the continuous infusion of pancuronium bromide (0.2 mg/kg/hr) in 5% dextrose and saline. A small catheter was then inserted into the bladder to prevent bladder distension. Sufentanil (sufenta) was then used to provide anesthesia and analgesia during the recording sessions. Brevital (3 mg/kg) was injected intravenously and the nitrous oxide was discontinued. A pre-sufentanil measurement of heart rate and systolic blood pressure was taken. Within 5 min after discontinuing the nitrous oxide, a loading dose of sufentanil (48 pgg/kg) was injected in divided doses. Initially one-quarter of the loading dose was injected and the systolic blood pressure was immediately monitored. Sometimes a precipitous drop in systolic blood pressure to below 90 mm of mercury occurred in which case a normal systolic blood pressure
612
LOWELL
D. JACOBSON
was usually restored following infusion of normal saline, and if not by injection of a small dose of ephedrine sulfate. Once the blood pressure had stabilized the remainder of the loading dose was injected and the systolic pressure was remeasured. (As long as i.v. infusion was adequate to maintain hydration as measured by a steady urinary output, blood pressure remained stable through the course of the 334 day experiment.) Following the loading dose, sufenta was infused at a rate of 24 pgg/kg/hr, or more if there was a significant increase in heart rate. 2.2. Optics Atropine sulfate was used for cycloplegia and mydriasis. Gas permeable hard contact lenses were selected by retinoscopy, and artificial pupils (3 mm dia) were then placed over each lens. Refraction was then corrected to +D by adjusting trial lenses. The positions of the fovea and optic disc were back-projected and marked on a tangent screen. Contact lenses were removed for about 4 hr late each night and the corneas were irrigated with a wetting solution and the lids lightly taped shut. The next morning atropine sulfate was again applied and the contact lenses were replaced. 2.3. Extracellular recording Tungsten and parylene microelectrodes (Bak Electronics) with exposed tip sizes, usually from 6-12 pm in length, were used. Standard techniques of amplification, filtering and spike discrimination were employed. Digital pulses from a window discriminator representing the occurrence of action potentials were processed by an IBM PC/AT. During early experiments, the spike data were stored to the PC/AT, whereas during later experiments, the data were immediately transferred to a 4D/l20 GTX work-station (Silicon Graphics, Inc.) for on-line processing and storage. Some studies were also recorded on analog tape when it appeared that multiple spikes might be discriminable during post-experiment analysis. 2.4. Histology During the experiments, electrolytic lesions were made using currents of 3-5 PA for 3-5 set to mark specific recording sites along the electrode track. At the end of the experiment, additional deeper marker lesions were made using currents of 10 PA for 30-60 sec. When the experiment was completed, the animal was given an overdose of sodium pentobarbitone and perfused intracardially with a phosphate buffered saline solution followed by a fixative with 2% paraformaldehyde and 1.5% glutaraldehyde. The brain was then placed in successively more concentrated sucrose in saline solutions until it sank and was then sectioned in 30 micron slices. Alternate sections were stained for Nissl substance with cresyl violet or for cytochrome oxidase using the method of Wong-Riley (1979). The histology confirmed that all cells studied were located in cortical area V 1.
et al
2.5. Visual stimulation
In the early experiments, drifting bar and grating stimuli were presented with a Tektronix 608 monitor with a P4 phosphor whose X, Y, and Z inputs were controlled by a Picasso image generator (Innisfree, Inc.)-frame rate 100 Hz, and spatiotemporal white noise stimuli were presented on a Model 1310 13 in. RGB monitor (Electrohome, Inc.) attached to a Series 100 image processing board (Imaging Technology, Inc.)frame rate 60 Hz noninterlaced, 5 12 h x 240 v resolution. A single IBM PC/AT controlled both image generation systems. The two image display screens were alternately moved into the animals line-of-sight when changing from traditional bar and grating stimuli to white noise stimuli and vice versa, remapping the neuron’s receptive field after each swap. During later experiments, all types of stimuli were presented on a single 19 in. RGB monitor attached to a 4D/l20 GTX workstation (Silicon Graphics, Inc.)-frame rate 60 Hz noninterlaced, 1280 h x 1024 v resolution. Nonlinearities in the intensity-response curves of the RGB monitors were measured with a photometer (Minolta, Inc.) and compensated for via software (Electrohome) or gamma correction hardware (Silicon Graphics). The spatial extent of the receptive field of each studied neuron was initially determined using bar stimuli presented either with a projector, drawing the receptive fields manually onto a tangent screen, or during later experiments, on a computer controlled RGB monitor, using a mouse to indicate an oriented bounding rectangle whose center was computed and saved. (When a single monitor was used, the saved receptive field map was used to automatically center all subsequent stimuli on the neuron’s receptive field.) Next, one-dimensional interleaved grating orientation, spatial frequency, and temporal frequency studies were performed and processed on-line to obtain d.c. and fundamental response curves. Quantitative drifting bright and dark bar studies as well as counter-phase grating studies were also performed on some cells. We next performed one to several white noise stimulus studies on each cell. Monocular white noise stimuli consisted either of an (unoriented) two-dimensional array of squares, or an (oriented) one-dimensional array of bars. All one-dimensional stimuli were rotated to match the neuron’s preferred orientation. The width of the squares (or bars) was selected so that 4-6 squares would sample one period of a grating at the neurons preferred spatial frequency. This generally provided a reasonable compromise between the spatial resolution vs the stimulus energy delivered within the spatial-frequency passband of the neuron. The two-dimensional grids generally included anywhere from 12 x 12 = 144 squares up to 24 x 24 = 576 squares. The number of squares was almost always sufficient to fully cover the neuron’s receptive field. The lower range of total squares was generally used when the grid squares were large-a consequence of performance limitations in the graphics hardware. Many neurons were also studied with binocular white noise stimuli. Taking advantage of the eyes’
MULTI-INPUT
natural divergence during paralysis, binocular stimuli were presented as a pair of grids on a single monitor. One grid was centered on the left-eye receptive field, the other on the right-eye receptive field of the neuron. Using a single screen was never a problem since the receptive fields never overlapped and could always be arranged to transect the same screen. The luminance values of the squares in the white noise stimuli were independently modulated at a rate of either 60 or 30 Hz. Each square could take on one of N distinct luminance values. The probabilities of occurrence of the luminance values were either equal (ternary stimulus, N = 3), or approximated a Gaussian distribution truncated at two SD (discrete Gaussian stimulus, N = 64). Intensity levels were always placed at equal steps of intensity and arranged to provide the largest possible contrast. The only exception to this was the occasional use of ternary stimuli with reduced contrast that matched that of the truncated Gaussian stimulus. 2.6. General analysis procedures Response data from each grating study were processed into tuning curves for the d.c. and fundamental components of the response. Peaks and bandwidths were manually measured from the tuning curves. The amplitude of the d.c. and fundamental response components at the tuning peak were used to compute modulation indices for the purpose of classifying cells as simple vs complex. Data from each white noise study were used to compute first- and second-order cross-correlation functions between the recorded neural spike train and the regenerated white noise stimulus. The mean spike rate was subtracted from each neural spike train before computing the cross correlations to eliminate the large main diagonal component that would otherwise appear in the second-order cross-correlation functions (e.g. see the Discussion on p. 164 of Marmarelis & Marmarelis, 1978). The first-order cross-correlation functions were computed over the full spatial support of the stimulus and for delays ranging from 0 to 200 msec and were saved in three-dimensional (x, y, t) files. Second-order cross-correlations were computed over a restricted spatial and temporal region that included the full signal support region observed in the first-order cross-correlation were saved in and six-dimensional (X19~I,tl,~2,~2,~2)
613
LN CASCADE MODELS
files.
Cross-correlation functions were viewed interactively using a custom software application (LOOK) that displays data arrays with arbitrarily many dimensions. If the signal component in a second-order cross-correlation function was observed to extend to the edges of the computed region, then the spatial and/or temporal region was expanded and the cross-correlation recomputed. Once a second-order cross-correlation function was obtained that included the entire signal support region, the LOOK application was used to crop the function to a six-dimensional rectangular region that fully included the signal but eliminated extra-signal regions that consisted only of noise. The coordinates of the
resulting spatiotemporal support region of each crosscorrelation function were stored in an associated file. The amplitude of the main diagonal of the secondorder cross-correlation functions computed from ternary white noise studies were scaled up by a factor of four, relative to the off-diagonal regions, to obtain a secondorder ternary kernel (cf. Marmarelis & Marmarelis, 1978, p. 202). No special scaling was required for first-order cross-correlation functions, nor for the second-order cross-correlation functions computed from Gaussian white noise studies. Next, we obtained estimates of the noise corrupting the second-order cross-correlation functions. This was done by computing, for each white noise study, an additional second-order cross-correlation over the same spatial region and number of delays as before, but for a range of delays in which the response preceded the stimulus. The resulting function will be referred to in later sections as the noncausal second-order cross-correlation function. Physical considerations dictate that the signal component of the noncausal cross-correlation function be identically zero. The noise corrupting the onand off-diagonal regions of the second-order cross-correlation functions from ternary white noise studies were given separate treatment owing to the scaling operation described above. 3. RESULTS In this section, we test neurons in the striate cortex for consistency with a multi-input LN cascade model. The LN tests specifically employ the first and second-order (photic) stimulus-response cross-correlation functions of each visual neuron studied. When we claim to be testing a cell, it should be understood that we are testing the entire network that underlies the photic response properties of the cell. This, of course, includes the combined action (and interaction) of many retinal, LGN, and cortical neurons. 3.1. Example of LN test procedure The LN structural testing procedure is illustrated in this section for one simple cell. In particular, a necessary condition that must be satisfied by a multi-input LN system is introduced and applied in a procedure that tests the simple cell for consistency with an LN structure. The first- and second-order cross-correlation functions of the cell are illustrated together with the multi-dimensional terms and residual that appear in the LN model testing relation. The cell was studied using an achromatic monocular ternary white noise stimulus that consisted of a 19 x 19 grid of squares whose luminances were randomly and independently changed at a 60 Hz frame rate. Each square subtended a visual angle of 0.206 deg both horizontally and vertically, resulting in a total stimulus size of 3.9 x 3.9 deg. The total duration of the study was 20.8 min, during which a total of 8560 spikes were recorded, yielding an average response rate of 6.86 spikes per sec.
614
LOWELL (A) 1st Order
3D
(C)
et al.
CC
72
2nd Order
D. JACOBSON
6D
prediction
(D) 2nd Order
residual
FIGURE 2. Functions that appear in the LN test for a simple ceil. (A) First-order cross-correlation, (B) second-order cross-correlation, (C) LN second-order prediction, and (D) LN second-order residual. The first-order cross-correlation in (A) has three independent dimensions (x,~, t) which are simultaneously shown in jilm-strip image format. The corresponding 3-to-2 (function-to-image) dimension mapping is defined by the three-dimensional Dimension Map (or DiMup) depicted to the left below (A). The other three functions shown in (B, C, D) each have six independent dimensions (x, , y,, T,, x2,y,, T*) which are simultaneously shown in a nested-mosnic image format. The 6-to-2 (function-to-image) dimension mapping is defined by the six-dimensional DiMap depicted to the right below (A) in the figure. For further description of the functions and their dimensional mappings, see the text.
The first-order cross-correlation function computed from the stimulus-response data is depicted in Fig. 2(A). It is a function of three dimensions (x, y, r). Two of these dimensions (x,y) specify the visual field position of an arbitrary spatial input in the 19 x 19 stimulus grid, and the third dimension (5) specifies the associated correlation time lag. A film strip format is used in the figure to depict the first-order cross-correlation function. Namely, the three-dimensional data are displayed as a one-dimensional sequence of abutted two-dimensional gray scale images that depict (x, y) sections through the correlation function at increasing time lags ranging from 0 to 184 msec. This format is indicated by the three-dimensional Dimension Map (or DiMup) that is shown to the left below Fig. 2(A). The smaller scaled axes in the DiMap indicate the more rapidly varying dimensions, [x and y in Fig. 2(A)], while the larger axes in the DiMap indicate less rapidly varying dimensions, [r in Fig. 2(A)]. DiMaps will be especially helpful in understanding the format of displays of second-order cross-correlation functions as described below. The second-order cross-correlation function computed from the same stimulus-response data is depicted in Fig. 2(B). It is a function of six-dimensions (x, , y, , z, , x2, y,, tZ). Four of these dimensions, (x,, yl)
and (x,, yZ), specify the visual field positions of an arbitrary pair of spatial inputs in the 19 x 19 stimulus grid, and the other two dimensions, r, and r2, specify their respective correlation time lags. A recursive mosaic format is used in Fig. 2(B) to simultaneously depict all six dimensions of the second-order cross-correlation function. The six-dimensional DiMap that describes the mapping of these six dimensions to the two-dimensional display is shown to the right below Fig. 2(A). In particular, note that the most rapidly varying dimensions along the horizontal and verticai directions of the figure are x, and y, , respectively. The next most rapidly varying dimensions along the horizontal and vertical directions are x2 and y,, respectively. Finally, the least rapidly varying dimensions along the horizontal and vertical directions are ri and r2, respectively. Now it is hypothesized that the steady-state response behavior of this neuron can be accurately modeled as a multi-input LN system as in conventional models for simple cells. This hypothesis is readily tested by evaluating the relationship between the cell’s first- and secondorder cross correlation functions. In particular, if a multi-input system has an LN structure, then its firstand second-order cross-correlation functions, both assumed to be nonzero, must satisfy the following
MULTI-INPUT
615
LN CASCADE MODELS
order prediction function, cr& is the variance of the noncausal second-order cross-correlation function (see below), and a& is the space- and time-dependent variance of the LN second-order prediction function (see -lCLN~,(X,,Y,,~,)~,(X2,y*,~*)=O (1) Appendix). Note that for this analysis, the second-order cross-correlation, LN prediction, and variance of LN where Q2(x, , y, , T, , x2, y2, z2) is the second-order stimu- prediction functions are collapsed to one-dimensional lus-response cross-correlation function, @,(x, y, z) is the vectors. The factor kLN that minimized equation (2) was first-order stimulus-response cross-correlation function, estimated using a computer program which solved the and k,, is a constant. This equation states that the normal equation of the model (Press, Flannery, second-order cross-correlation function of a multi-input Teukolsky & Vetterling, 1988). LN system must be proportional to the second-order As is evident from equation (2), this analysis requires outer product of the first-order cross-correlation func- an estimate of the squared measurement error, tion with itself. The latter six-dimensional (outer a& + ktN a& at each data point. The first error term product) function will hereafter be referred to as the LN a& is a measure of the noise in the estimated secondsecond-order prediction. order cross-correlation function. It is a constant and The LN second-order prediction for the simple cell is equals the variance of the noncausal cross-correlation shown in Fig. 2(C). It is depicted using the same function (see Methods). The second error term k&a&, dimensional format as the real second-order cross-correis a measure of the noise in the LN second-order lation function in Fig. 2(B). Any similarities and prediction function after it has been scaled by the differences can be readily observed. regression constant KLN. The form of a& is derived in A linear regression was performed to determine the the Appendix. Its magnitude is shown to vary with the scale factor kLN that minimizes the mean square differ- spatial and temporal indices of the six-dimensional space ence in equation (1). The six-dimensional residual that over which the LN prediction is defined. Although the results from subtracting the scaled LN second-order scaled LN prediction error kt,at,, was included in our prediction from the real second-order cross-correlation reported analyses, it was small when compared with the function is shown in Fig. 2(D). Note that the two second-order cross correlation error, and we found that functions in Fig. 2(B, C) appear quite similar and that it could also have been excluded without significantly the residual in Fig. 2(D) is small. Therefore, the first- and altering our results. second-order cross-correlation functions of the simple The general linear least squares procedure described cell in this example approximately satisfy the relationabove yields a x2 statistic with expected value k and ship in equation (l), which must be satisfied by a variance 2k, where k = N - 1 is the number of degrees multi-input LN system. of freedom (independent data samples) in the secondThe LN test results from this simple cell, as well as all order cross-correlation data. Using this information, we other cells in our sample, are next quantified in terms of computed a statistic, hereafter referred to as Z,,, two principal statistics, Z,, and RLN. The first of these defined by statistics (Z,,) is used to test, for each cell, the null Xi&u,- k hypothesis that the first- and second-order cross-corre(3) LN /;;; ’ lation functions are consistent with an LN structural model. We will observe that Z,, is quite sensitive to the where xisidua,, previously defined in equation (2) is the signal vs noise content of the underlying correlation chi-squared statistic associated with the LN seconddata. In effect, the likelihood of passing the LN test order residual. ZrN is therefore defined as the normalized decreases precipitously as the fidelity of a cell’s corredifference between the actual and expected values of the lation data improves. The second statistic (RLN) shows chi-squared statistic. Z,, is a standardized score, and is greatly reduced sensitivity to signal-to-noise changes and distributed normally with a mean of zero and a standard is used to characterize the degree to which a cell deviates deviation of 1.0 for neurons which conform to an LN from LN behavior. model. We also computed a second statistic, hereafter referred . . 3.2. LN test statmtrc Z,, to as Zsignal,defined by The scale factor, kLN, that minimized the square difference between the second-order cross-correlation (4) function and the LN second-order prediction in equation (1) was estimated using a general linear least where squares model. The goodness-of-fit, or merit function, for this model is (5)
relationship (Korenberg, 1973a, b, 1982; Victor, Shapley & Knight, 1977; Chen et al., 1990):
z
XLdual
=
5
i= I
yi-k,xi
2 a& + kt,atN,>
7
(2)
where yi and xi are respectively samples of the secondorder cross-correlation function and the LN second-
=
and where the yi are samples of the second-order crosscorrelation function, a& is the variance of the noncausal second-order cross-correlation function, and k is the number of independent degrees of freedom as defined
616
LOWELL D. JACOBSON et a/
above. Zsignalis a measure of the second-order cross-correlation signal strength in standard deviation units. The expected value of Zsignalis 0.0 when there is no signal, and increases monotonically with increasing signal-to-noise ratio of the cross correlation function. Figure 3 plots Z,, against Zsigna,for all cells in our sample. While there are some points with ZLN values near zero, as expected for an LN system, there is a strong tendency for Z,, to increase with Zsignal.One might therefore suspect that all cells would exhibit large Z,, values, inconsistent with an LN model, provided that the signal-to-noise ratios of their estimated first- and secondorder cross-correlation functions could be further raised. We tested this hypothesis by reanalyzing our data using only those regions of the second-order cross-correlation function where the signal was the strongest. These regions were found by first estimating the space-time varying envelope of the second-order cross-correlation function. We then discarded all regions of the crosscorrelation function where its envelope did not exceed 3 times the standard deviation of the noise-previously estimated from the noncausal second-order crosscorrelation function. The results from the LN tests performed on these masked data sets are shown, using logarithmic axes, in Fig. 4 (open symbols) along with the results from the original unmasked data (solid symbols). In general, the masking operation caused both Z,, and Zsignalto increase. Therefore, as suspected, an improvement in the signal-to-noise ratio of the secondorder cross-correlation data leads to substantial increases in Z,, . The lowest value for Z,, in the masked data set is about 7.0. The probability that a system with an LN structure would yield a Z,, value this high or greater by chance is < 10P12. The resulting high Z,, values lead to the conclusion that an LN structural model provides an incomplete description of the filtering properties of all cells in our sample. It is important to 70 60 --
. .
.
z
d
i 10 i
t 10 1
10
100
1000
Zsignal FIGURE 4. Scatter plot of Z,, vs Zs,sn., for masked and unmasked data. The unmasked data from Fig. 3 is redrawn using logarithmic axes (solid circles). The open circles show the results obtained when Z,, and Z Elgna,were computed using only the regions of the second-order cross-correlation function where the envelope exceeded 3 times the standard deviation of the noise. Note: two Z,, values which were < I.0 were set equal to 1.0.
emphasize that, because our conclusions hold true for every possible parametrization of the spatiotemporal pre-filter (L) and static nonlinearity (N) in the LN structural model, every instance of a multi-input LN system can therefore be rejected as a complete model for each cell in our sample. Note, however, that our rejection of the LN model for all cells is based on a very strict test. As the signal-tonoise of the cross-correlation functions improves, the Z,, statistic would reject an LN model for cortical cells even when a large part of the I-0 relationship of the cells can be accounted for by an LN model. In the following section, we develop a new index that measures the percentage of the relationship between the first and second-order cross correlation functions of a cell that can be accounted for by an LN model. 3.3. Relative cancellation index RLN
The strong dependence of the Z,, statistic on the signal-to-noise ratio of the second-order cross-corre. lation confounds the interpretation of Z,,. That is, a z 4o -. . . high Z,, value signifies both a deviation from LN d -. . 30 behavior and a high signal-to-noise ratio. In this section, we develop a new statistic which is much less dependent 20 -. on the signal strength of the second-order cross-correl **. lation function. The new statistic more closely reflects 10 -a@ .** . the intrinsic filtering properties of each cell and offers a . . . l* useful index for the degree to which a cell deviates from . I 0-l LN behavior. The new index will be obtained by normal0 10 20 30 40 50 60 70 izing the statistic, Z,, , with respect to the signal strength Zsignal of the stimulus-response cross-correlation data. FIGURE 3. Scatter plot of Z,, vs Zslgna,. These measures are A closer examination of Fig. 4 reveals two groupings respectively defined in equations (3) and (4) in the text. Z,, is used to of data points that roughly cluster along different slopes. test the null hypothesis that the first and second-order cross-correlation For the first group, ZLNis approximately equal to Zsiynai. functions of a cell are consistent with an LN structural model. Ztis,,*, Hence, for these points, the LN second-order prediction is a measure of the signal strength in the estimated second-order was almost completely ineffective at canceling the signal cross-correlation. See the text for details. 50 --
l
MULTI-INP~
617
LN CASCADE MODELS
in the second-or&r cross*~~elation function. The second group of data points shows partial cancellation of the second-order cross-correlation function, a result which is at least partially consistent with an LN structure. The amount of cancellation was quantified by computing a relative cancellation index, RLN, defined by
3.4. Comparison of R,
and grating mo~iat~on indices
The quantitative measure that is most commonly used to classify cells as either simple or complex is the rnod~~at~on index. T&s index is derived from the response of a cell to a drifting sinewave grating whose spatial frequency, temporal frequency and orientation fall at the peak of the cell’s passband. In the m5du~at~on index is defined by particular, ZigA zLN @I &,= Z - ~O~~,~,where ~1~~~“~~~ is the ampliF 1 stimtljus ~(FO~ti~“l”~ signal tude of the fundamental response component at the where Z,, and Zsigna, were previously defined in temporal frequency of the drifting grating stimulus, equations (3) and (4), respectively. For a noise free Fostimutus is the d.c. response component to the grating system, RLN ranges from 0.0 (no cancellation) to 1.0 stimulus, and FO,,,, is the d.c. response to a uniform field (complete cancellation). For a stochastic system, RLN with mean luminance equal to that of the grating values above 1.0 can result when Z,, is negative. stimulus. By convention, cells with modulation indices The relative cancellation index, RLN, is plotted against below 1.0 are classified as complex, and cells with Zsignalin Fig. 5 using the unmasked data set. Notice that modulation indices above 1.0 are classified as simple R,, is not well correlated with Zsignal.Moreover, mask(DeValois, Albrecht & Thorell, 1982; Skottun, DeValois, ing the data did not significantly change the average RLN Grosof, Movshon, Albrecht & Bonds, 1991). values (mean RLN ~masked = 0.335, mean RLN Figure 6 plots the relative ~an~llation indices, RLNof masked = 0.344; t = 0.48; d.f. = 32; P > 0.10). The the cells in our sample vs the corresponding modulation masking procedure did, however, significantly change indices, As seen in the plot, there is general agreement the average &, values (mean Z,, unmasked = 17.99, between the classification schemes based on the two mean Z,, masked = 58.5; t = 5.99; d.f. = 32; P < 0.001). indices. In particular, cells with R,, values > 0.3 tend to Therefore, unlike the statistic ZLNy the relative cancellahave modulation indices > 1.0, and cells with R,, valtion index R,, is rather insensitive to the cropping and ues < 0.3 tend to have modulation indices c 1.0. Only 4 signal strength of the second-order cross-correlation of 32 cells (open symbols) are differently classified by the function. two schemes. Even though the LN structure does not fully account for the filtering of any striate cell in our sample, we can 1.2 nonetheless use the relative cancellation index, RLN, to I 1 * divide our cell sample into two groups that differ by the f degree to which they are inconsistent with LN behavior. Such a division is indicated in Fig. 5 by a dotted line at 0.8 a threshold RLN = 0.3 which vertically segregates the t a At data points into two groups. It is hypothesized that these 0.6 two groups correspond to simple and complex cells. We ’ l * . 1. evaluate this hypothesis in the next section by comparing 1. . . 0.4 d I classification based on the RLN statistic with a more t ---_---L----------traditional classification scheme, l
3i
1.2 f 0
t
I, 2. e
0
0.5
I
I I
$3
1
1.5
2
2.5
Modulation Index
l
0.4
.=
l l
-------------------
.
r:
0
0
.;
l
20
40
l
., . 60
I
SO
Zsignal FIGURE 5. Scatter plot of RLNvs it&,&. The dashed line at I& = 0.3 divides the ceils into two groups that differ in the degree to which they deviate from LN behavior.
FIGURE 6. Scatter plot of II,, vs m~ulation index. Neurons with modulation indices > 1.O(vertical dashed line) are typically classified as simple, while those with modulation indices below 1.O are typically classified as complex. We observe in the plot that neurons with RtN indices > 0.3 fhorixontal dashed line) tend to have modulation indices > 1.0, while neurons with R,, indices < 0.3 tend to have modulation indices < 1.0. These observations therefore establish the feasibihty of using R,, indices to classify cells as simple (R,, s 0.3) vs complex (I?, < 0.3) based upon their responses to white noise stimulation. Two neurons (open triangles) were classified as simple using the RLNindex that would be classified as complex using the modulation index. Two neurons (open squares) were classified as complex using the RLN index that would be classified as simple using the modulation index. Solid symbols represent neurons that have the same class& cation in both schemes. Solid squares remesent data from ‘“linear nonoriented” cells, that is, cells that were not seiective for stimulus orientation and had modulation indices > 1.0.
618
LOWELL
D. JACOBSON
As is evident from Fig. 6, most complex cells, (so classified by their modulation indices), have R,, values below 0.3. Since an RLN value near 1.O is expected for an LN structure, it is clear that the filtering performed by all complex cells in our sample cannot be well described by an LN model. This is the expected result since it is well known that complex cells must combine parallel nonlinear channels, and hence, would be poorly described by a single multi-input LN cascade. The results obtained with simple cells are more interesting. First, we note that some simple cells, (so classified by their modulation indices), had RLN values that were close to 1.0. If a criterion of, say R,, > 0.8, is used, then the filtering properties of about 15% (2 of 13) of the simple cells in our sample would appear to be reasonably consistent with a multi-input LN structure. However, the remaining 85% of simple cells in our sample have R,, values that, despite being larger than those of complex cells, are still well below 1.0. Therefore, the majority of simple cells studied seem to be rather poorly modeled by a multi-input LN network structure. The results from two cells which were not selective for stimulus orientation and had modulation indices > 1.0 are shown using solid squares in Fig. 6. More data is needed to determine if the LN model is more consistent with “linear nonoriented” cells than with orientation selective simple cells. 3.5. Factors affecting LN test results In this section, we examine some possible explanations for the degree to which most simple cells fail to be consistent with an LN network structure. 3.5.1. Statistical model. One possible explanation for the failure of most simple cells to pass the LN test derives from the observation that our statistical model is incomplete. Namely, when computing Z,,, we assumed a constant value for the variance of the noise that corrupts our estimate of the second-order cross-correlation function. It has, in fact, been shown that the true estimation error varies with the amplitude and time delay of the cross correlation functions (Marmarelis & Marmarelis, 1978, Chap. 7). Therefore, it is possible that we significantly underestimated the true measurement error, causing an overestimation of Z,,, and hence erroneous rejection of the LN model. Two factors suggest, however, that an improved statistical model would not significantly alter our results. First, for several cells that came nearest to satisfying the LN prediction, we directly analyzed how the variance functions cross-correlation of their second-order
et al.
changed as a function of the absolute amplitude of their corresponding second-order predictions. In these analyses, the noise variance was not found to vary significantly with the signal level. Second, as discussed below, direct examination of the second-order cross-correlation and the LN second-order prediction shows that statistical refinements are insufficient to explain the failure of the LN model. 3.5.2. Nonseparability of second-order cross -corre lationfunctions. Figure 7(a-c) depicts 4D subsections of the second-order cross-correlation functions and corresponding LN second-order prediction functions of three cells that were classified as simple using both modulation index and relative cancellation index (RLN) criteria. In these figures, only the spatial indices of the second-order cross-correlation and LN prediction functions vary; the temporal indices z, and r2 are both fixed at the delay that produced the highest second-order cross-correlation signal strength. In Fig. 7(a), the actual second-order crosscorrelation and the LN second-order prediction are seen to be quite similar. The strong similarity is confirmed by the corresponding R,, value of 1.06. In Fig. 7(b), the similarity between the second-order cross-correlation and the LN prediction is less pronounced, and the R,, value is reduced to 0.64. Finally, Fig. 7(c) shows results from a cell in which the second-order cross-correlation and the LN prediction are among the least similar from our sample of simple cells, yielding an R,, value of 0.50. Recall that the second-order cross-correlation function of a multi-input LN system must be proportional to the outer product of the first-order cross-correlation function with itself [equation (l)]. Therefore, the secondorder cross-correlation function must be x, - s? and y, -y? separable as is true of the LN prediction functions as illustrated in Fig. 7. However, an examination of the second-order cross-correlation functions in Fig. 7 reveals that as R,, decreases, the amount of elongation parallel to the x, E x2 and y, E y, diagonals tends to increase. Such elongation is incompatible with the separability requirement of an LN model and largely accounts for the low R,, values of these cells. There are several possible explanations for the observed diagonal elongation. However, before discussing these, it is helpful to compare the second-order crosscorrelation functions of simple cells with those of complex cells. In particular, the second-order crosscorrelation functions of complex cells generally exhibit a high degree of diagonal elongation as illustrated for one complex cell in Fig. 8. For complex cells, the diagonal elongation establishes that, within the receptive field of
FIGURE 7 (facing page). Second-order cross-correlation functions (left column) and LN second-order predictions (right of the original six-dimensional column) for three different simple cells. Four-dimensional (x, ,y,, x2, y2) subsections the highest second-order functions are shown in each case, obtained by fixing r, = r2 at the delay that produced cross-correlation signal strength. The 4D-to-2D (function-to-image) mapping is defined by the DiMap at the top of the figure. The R,, and modulation indices are indicated for each cell. Each LN prediction function is plotted so as to emphasize its X, vs xr and y, vs y, separability (see text). Note that the subzones of the corresponding second-order cross-correlation functions show various degrees of diagonal elongation (nonseparability) which is greatest for those simple cells with smaller R,, values. Such elongation implies a partial loss of sensitivity to spatial phase information, and as illustrated in Fig. 8, the elongation is even more pronounced in the second-order cross-correlation functions of complex cells.
MULTI-INPUT
2nd Order CC
LN CASCADE MODELS
LN Prediction
619
CC
LN Prediction
FIGURE 8. Second-order cross-correlation function (left) and LN second-order prediction (right) for a complex cell. Data are shown using the same format as in Fig. 7. Note that the strong diagonal elongation evident in the second-order cross-correlation function is incompatible with the separability of the LN second-order prediction function, leading therefore to a small R,, value, The strong diagonal elongation implies that the cell is most sensitive to the relative, rather than the absolute. positions of paired stimulus spots. This is a well-known response property of complex cells that results from the convergence of parallel nonlinear spatia1 filtering channels.
2nd Order
MULTI-INPUT
621
LN CASCADE MODELS
the cell, responses to pairs of point stimuli depend primarily on a measure of their relative positions [(x,, y2) - (x, , y, )] and less on a measure of their absolute positions [(x2, yz) + (x, , y, )]. Such response dependence of complex cells on relative rather than absolute position was previously demonstrated by Movshon et al. (1978b) in two-bar interaction studies. The second-order cross-correlation data in Figs 7 and 8 can be viewed as a complete two-point interaction study in which all possible interactions within and around the receptive field are examined. The loss of absolute position sensitivity exhibited by complex cells has provided the major support for models in which complex cells summate parallel LN mechanisms with differing spatial positions or phases (Movshon et al., 1978b; Pollen & Ronner, 1983; Spitzer & Hochstein, 1985). Given the above information about complex cells, one might suppose that the diagonal elongation observed in the second-order cross-correlation functions of simple cells also signifies the involvement of parallel nonlinear channels in establishing the responses of individual simple cells. However, before pursuing this possibility in the Discussion section, we first consider two methodological explanations for the diagonal elongation that can be rejected experimentally. 3.5.3. Eye movements. The first- and second-order cross-correlation functions are time-averaged characterizations of a system and, in the present experiments, are computed from a white noise stimulus-response record that lasts from several minutes to up to about 1 hr. Our interpretation of such cross-correlation data rests upon the assumption that the system is time-invariant over the full duration of the study. Of course, a system that is time-invariant with respect to a retinal coordinate frame will, in the presence of eye movements, be time varying with respect to a fixed external display coordinate frame. In particular, eye drift would cause the linear prefilter of an LN system to be centered at successive positions in the display coordinate frame. Unfortunately, time-averaging, such as that implied by cross-correlation, destroys the ability to distinguish between successive vs simultaneous positions of a receptive field. As a result, the cross-correlation functions of an LN system with a prefilter that changes its receptive field position are indistinguishable from the cross-correlation functions of a system with parallel LN channels. We therefore devised a method to test the hypothesis that eye movements might explain the failure of many simple cells to better fit the predictions of the LN model. This was done by computing ten partial estimates of the first-order cross-correlation function, computed over successive, independent epochs of the white noise stimulus-response record. For each partial estimate, we determined the x- and y-position shifts that maximized the correlation between the partial estimate and the total first-order cross-correlation function. The square root of the summed squares of the x- and y-position shifts was then used as a measure of eye displacement during that epoch of the stimulus-response record. Finally, the overall severity of eye drift during a cell study was
characterized by the average of the eye displacement estimates across all ten epochs. The average eye drift measurements were small for cells included in this study (mean = 0.083 stimulus blocks, standard deviation = 0.129). As stated in the Methods section, stimulus block size was chosen during the experiment so that there were 4-6 blocks per period of the preferred spatial frequency of the cell as measured using sine wave gratings. Thus, the mean of the average eye drift of simple cells in our sample was approx. 0.0275 cycles of a grating at a cell’s preferred spatial frequency. In addition, the small eye drifts that were measured were not correlated with the R,, values obtained from these cells (r2 = 0.0005; t = 0.08; d.f. = 13; P > 0.10). These quantitative results are consistent with our further observation that, if elongation was evident in the total secondorder cross-correlation function, then it was also apparent in the partial estimates of its second-order cross-correlation function. The above analysis rules out the most common eye stabilization problem, i.e. slow eye drift, as an explanation for the failure of simple cells to satisfy the LN test. However, because the partial estimates of the first-order cross-correlation functions were typically computed over a l-3 min epoch, a jitter about a mean eye position with a temporal period less than this would not be detected. If eye position jitter were present, we would expect that a jitter of constant magnitude would reduce the R,, values of cells tuned to high spatial frequencies more than cells tuned to low spatial frequencies. However, no correlation was found between RLN and the high 50% cutoff spatial frequency of the cells in our sample (r* = 0.0001; t = 0.04; d.f. = 13; P > 0.10). 3.5.4. Linear post-jilter. If an LN cascade structure is an incomplete description of our results, can a structure composed of an LN cascade followed by an additional linear post-filter account for the data? Because our test was formulated for an arbitrary multi-input LN structure, additional linear filters before the static nonlinearity would not affect the results. However, an LN structure with a linear filter after the static nonlinearity, would not, in general, pass the LN test. A multi-input LN cascade with an additional post-linear filter is referred to as an LNL structure [Fig. l(b)]. Although we don’t realistically expect to observe a linear post-filter following the static output nonlinearity of simple cells, a post-filter may effectively be imposed upon the collected data by our practice of binning spikes over intervals of one stimulus frame (either & or hsec). A necessary condition for a multi-input LNL system is as follows:
s
~z(x,,Y,,*I,~~,Yz,~~)~z~
-
s
km.,,
where @p,k~9~)
~,(x,,y,,r,)~,(x,,y,,z,)dz,=O
and ~2(x,,~I,~I,x2,~2,~2)
(7)
are the
first- and second-order stimulus-response cross-correlation functions, and k,,, is a constant. This test is a
LOWELL
622
D. JACOBSON
minor variation of the conventional test for LNL systems (cf. Korenberg, 1973a, 1982; Chen et al., 1990). Note that this LNL model test is similar to the test used for the LN model, [see equation (l)], except that the LNL test integrates over one of the temporal lag dimensions. If a simple cell’s first- and second-order cross correlation functions are nearly consistent with an LN model, we should also expect to observe near agreement with an LNL model since the former is a special case of the latter. If, however, a cell has a post-linear filter with a time constant for a few tens of milliseconds, then its R,, value will diminish, whereas an index based on the LNL test should remain high. We therefore computed a new index, denoted R,,, , which is similar to the R,, index defined by equation (6), but which, rather than using the LN residual, uses instead the LNL residual defined by the difference in equation (7). A scatter plot which compares R,,, to R,, for the cells we studied is shown in Fig. 9. There is almost no difference between the two statistics, and therefore an LNL model does not describe simple cell filtering much better than the LN model. 4. DISCUSSION
We have investigated the first- and second-order stimulus-response cross-correlation functions of cells in the striate cortex of the macaque monkey to determine if their relationship is consistent with a multi-input linear-nonlinear (LN) cascade model. The LN cascade model includes, as special cases, all conventional models for simple cells which consist of a spatiotemporal linear filter followed by a static rectification nonlinearity. The LN model testing procedure used in this paper, and hence also the results discussed below, are indifferent to the parametric form of the linear (L) and static nonlinear (N) components of the LN structural model. For example, the LN test procedure will give equivalent results for LN systems with different static nonlinearities
/
l/ /
7
/
/
/
/’
‘.
I
I
0 0
0.2
0.4
0.6
0.8
1
1.2
RLNL FIGURE 9. Scatter plot of RLNL vs R,,.The dashed line shows the unit slope line. The regression statistics for the best fitting line are as follows: slope = 1.02, y intercept = -0.00003, t = 21.48, d.f. = 25, P
el al.
including biased or unbiased rectification, half-squaring, or sigmoidal functions. The LN test procedure quantifies the discrepancy between the actual and LN-predicted second-order crosscorrelation functions of each cell by computing a modelspecific index RLN [equation (6)]. Given noise-free cross-correlation estimates, the index R,, would vary from 0.0, when the functional terms in the corresponding necessary relationship [equation (l)] are mutually orthogonal and the relationship is therefore violated, to 1.0, when the functional terms in the corresponding necessary relationship are linearly dependent and the relationship is therefore satisfied. More generally, given noisy cross-correlation estimates, a small RLNindex value (near 0.0) implies a large deviation from LN behavior, whereas a large index value (near 1.0) is expected for a cell that is well described by an LN structural model. The RLNindex was shown to be loosely associated with the modulation index used to classify cells based on their responses to drifting sinewave gratings (Fig. 5). In particular, an R,, threshold of about 0.3 divided our cell sample into two groups which correspond closely to the simple and complex groups obtained using a conventional modulation index with a threshold of 1.0. Therefore, the R,, index can serve as the basis for a robust structurally derived technique for classifying cells as simple vs complex. The rather weak correlation (r* = 0.53) of the R,, and modulation indices, apparent in Fig. 5, results from three factors. First, the R,, index summarizes a cell’s average deviation from LN behavior over the full range of spatial and temporal frequencies to which the cell is sensitive, whereas the modulation index is specific to a particular (peak) orientation, spatial frequency, and direction. Second, the R,, index is based on the response to a noise stimulus with a broadband temporal spectrum whereas the modulation index is measured from the response to a stimulus with a narrowband temporal spectrum (a single drifting grating). This stimulus distinction may be important because the time constant of cat simple cells is affected by a dynamic nonlinear process that is sensitive to the temporal bandwidth of the stimulus (Reid et al., 1992). Third, while the R,, index is indifferent to the form of the static output nonlinearity of an LN system, the modulation index depends on the threshold and shape of this nonlinearity. The degree to which different simple cells deviate from LN behavior is well characterized by their respective RLN values which ranged from 0.37 to 1.06. The average RLN value for cells classified as simple (using R,, > 0.3 as a criterion) was 0.60, which means that, on average, about 60% of the relationship between the first- and secondorder cross-correlation functions of simple cells can be accounted for by an LN structure. Therefore, although an LN model provides a good approximation to the steady-state I-O properties of some simple cells, most simple cells in our sample deviate substantially from LN behavior. Improved models are therefore required to describe the filtering properties of most simple cells. The
MULTI-INPUT
623
LN CASCADE MODELS
-
R,(t)
(W FIGURE 10. Block-structured shunting feedback models for simple cells. (a) Single-channel LN model with shunting dynamic feedback (an example of an L-DN model), (b) multi-channel LN model with shunting dynamic feedback (an example of a multi-channel L-DN model). The interpretation of inputs, outputs, L-blocks and N-blocks is the same as in Fig. 1. A DN-block denotes an arbitrary single-input/single-output dynamic nonlinear subsystem.
present data suggests that alternative models should incorporate at least two features. First, the nonlinear mechanisms causing deviation of simple cells from LN behavior must have relatively short time constants. This conclusion is supported by the observation that the first- and second-order cross-correlation functions of cells in our sample do not exhibit significant energy at latencies exceeding about 200 msec. In addition, the broadband noise stimuli used in these experiments would not cause significant temporal variations in the activation of mechanisms with time constants on the order of seconds. Second, alternative models should account for the characteristic way in which the second-order crosscorrelation functions of simple cells deviate from an LN prediction. Namely, they exhibit a diagonal elongation with respect to space-space interactions [e.g. Fig. 7(c)]. Such diagonal elongation is one of the hallmarks of a system that includes parallel nonlinear channels, and is a highly conspicuous feature of the second-order cross-correlation functions of complex cells (Fig. 8). The diagonal elongation observed
with simple cells, though less pronounced than that of complex cells, might also arise from multiple nonlinear channels. For example, if cells in the lateral geniculate nucleus (LGN) exhibit substantial rectification in their outputs, then they would provide multiple nonlinear inputs to simple cells. The displacements of the spatial filters of such rectified LGN neurons would lead to elongation of the space-space interactions in the second-order crosscorrelation functions of simple cells. There is evidence in support for a “push-pull” component of simple cell receptive field organization in which ON and OFF center LGN input is combined both additively and subtractively (Palmer & Davis, 1981; Ferster, 1988; Tolhurst & Dean, 1990; Lui, Gaska, Jacobson & Pollen, 1992). If LGN cells are half-wave rectified and the strength of the additive and subtractive inputs are equal, then such a process could completely cancel the effects of rectification of the LGN inputs to simple cells. However, if these conditions are not met, then some of the effects of subcortical rectification would be evident in the filtering of simple cells.
624
LOWELL D. JACOBSON et al
Multiple interacting nonlinear channels that affect simple cell responses could also rise within the cortex itself. For example, the short latency filter properties of many cortical simple cells may not be well modeled as deriving exclusively from an isolated feedforward LN channel, but rather from a larger nonlinear network of mutually interacting LN channels. This view of cortical processing is supported by nonlinear suppression studies (Bonds, 1989) as well as direct investigations of cell-tocell response coupling (Ts’o & Gilbert, 1988). Nonetheless, before concluding that the diagonal elongation observed with simple cells is derived from multiple nonlinear channels, we must first examine more complicated single channel nonlinear models. A single channel nonlinear model is here defined as a model that includes only one multi-input/single-output (spatiotemporal) linear filter. For example, the multi-input LN model is a single channel model in which the single output of the spatiotemporal linear filter (L) is transformed by a static nonlinearity (N). The general class of single channel nonlinear models is obtained by allowing the output of the multi-input pre-filter to be transformed by an arbitrary single-input dynamic nonlinear (DN) subsystem. Such a linear-dynamic nonlinear system will be referred to as an L-DN system. An example of an L-DN simple cell model is shown in Fig. 10(a). It consists of an LN structure that is elaborated by adding a dynamic feedback mechanism which uses shunting inhibition to modify the gain of the cell. Space-space diagonal elongation can occur in the second-order cross-correlation functions of a multiinput L-DN model when the spatiotemporal prefilter (L) is space-time inseparable. However, diagonal elongation is strictly precluded for a multi-input L-DN model when the spatiotemporal prefilter is space-time separable. This latter result is true irrespective of the complexity of the dynamic nonlinear subsystem, as can be readily shown using the spatiotemporal separability condition together with equation (9.65) in Marmarelis and Marmarelis (1978). We observe, however, that the second-order cross-correlation functions of simple cells can exhbit space-space elongation [e.g. the cell of Fig. 7(c)] even when their first-order cross-correlation functions (not shown) are space-time separable, precluding therefore an L-DN model for such simple cells. Moreover, for all cells in our sample, including those with space-time inseparable pre-filters, we were able to preclude an LNL model [an L-DN model in which the dynamic nonlinearity (DN) is an NL cascade]. Thus, it is unlikely that L-DN models will be adequate to describe the present results. Rather, models must be considered in which the steady-state filtering properties of simple cells are derived from multiple converging or interacting LN (or L-DN) channels. One specific modeling approach that shows considerable promise is the simple cell network model of Heeger (1991) which incorporates fast feedback from a population of simple cells to cooperatively normalize their responses. A block diagram of one realization of his model is shown in Fig. 10(b). It is a multi-channel
generalization of the single channel L-DN model shown in Fig. 10(a). Heeger’s model accounts for the observed suppression of a reference stimulus within the passband of a simple cell by a second test stimulus placed within or outside the cell’s passband. Such fast acting suppressive effects, mediated by parallel nonlinear channels, might also help to explain the deviation from LN behavior observed in the present study. In future work, we will attempt to quantify the consistency (or lack thereof) of this and other multi-channel nonlinear models with the white noise stimulus-response cross correlation data of simple cells. REFERENCES Andrews, B. W. & Pollen, D. A. (1979). Relationship between spatial frequency selectivity and receptive field profile of simple cells. Journal of Physiology, London, 287, 163-176. Billings, S. A. (1980). Identification of nonlinear systems-a survey. Proceedings of IEE, 127, 272-285. Billings, S. A. & Fakhouri, S. Y. (1979). Identification of a class of nonlinear systems using correlation analysis. Proceedings of IEE, Correspondence,
125, 691-695.
Billings, S. A. & Fakhouri, S. Y. (1981). Identification of nonlinear systems using correlation analysis and pseudorandom inputs. Inrernational Journal of Systems Science, I I, 261-279. Billings, S. A. & Fakhouri, S. Y. (1982). Identification of systems containing linear dynamic and static nonlinear elements. Automatica, 18, 16-26. Bonds, A. B. (1989). Role of inhibition in the specification of orientation selectivity of cells in the cat striate cortex. Visual Neuroscience, 2, 41-55.
Chen, H.-W., Ishii, N. & Suzumura, N. (1986). Structural classification of nonlinear systems by input-output measurements. International Journal of Systems
Science,
17, 741-774.
Chen, H.-W., Jacobson, L. D. & Gaska, J. P. (1990). Structural classification of multi-input nonlinear systems. Biological Cybernet its, 63, 341-357.
Dean, A. F., Tolhurst, D. J. & Walker, N. S. (1982). Non-linear temporal summation by simple cells in cat striate cortex demonstrated by failure of superposition. Experimental Brain Research, 45, 4566458.
DeAngelis, G. C., Ohzawa, I. & Freeman, R. D. (1991). Depth is encoded in the visual cortex by a specialized receptive field structure. Nature, 352, 156-159. De Valois, K. K. & Tootell, R. B. (1983). Spatial-frequency-specific inhibition in cat striate cortex cells. Journal of Physiology, London, 336, 3599376.
De Valois, R. L., Albrecht, D. G. & Thorell, L. G. (1982). Spatial frequency selectivity of cells in the macaque visual cortex. Vision Research,
22, 545-560.
De Valois, R. L., Thorell, L. G. & Albrecht, D. G. (1985). Periodicity of striate-cortex-cell receptive fields. Journal of the Optical Society of America A,2, 1115-l 123. Emerson, R. C., Korenberg, M. J. & Citron, M. C. (1992). Identification of complex-cell intensive nonlinearities in a cascade model of cat visual cortex. Biological Cybernetics, 66, 291-300. Emerson, R. C., Citron, M. C., Vaughn, W. J. & Klein, S. A. (1987). Nonlinear directionally selective subunits in complex cells of cat striate cortex. Journal of Neurophysiology, 58, 33-65. Ferster, D. (1988). Spatial opponent excitation and inhibition in simple cells of the cat visual cortex. Journal of Neuroscience, 8, 1172-l 180.
French, A. S. (1976). Practical nonlinear system analysisby Wiener kernel estimation in the frequency domain. Biological Cybernetics, 24, 111-119. Gilbert, C. D. & Wiesel, T. N. (1990). The influence of contextual
stimuli on the orientation selectivity of cells in the primary visual cortex of cat. Vision Research, 30, 168991701.
MULTI-INPUT
Hammond, P. & MacKay, D. M. (1981). Modulatory influences of moving textured backgrounds on responsiveness of simple cells in feline striate cortex. Journal of Physiology, London, 319, 431-442. Heeger, D. J. (1991). Nonlinear model of the neural responses in cat visual cortex. In Landy, M. & Movshon, J. A. (Eds), Computafional models of visual processing, Cambridge, Mass: MIT Press. Hunter, I. W. & Korenberg, M. (1986). The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological
Cybernetics,
55, 135-144.
Jones, J. P. Kc Palmer, L. A. (1987). The two-dimensional spatial structure of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58, 1187-1211. Klein, S. & Yasui, S. (1979). Nonlinear systems analysis with nonGaussian white stimuli: General basis functionals and kernels. IEEE Transactions,
Information
Theory, 25, 495-500.
Korenberg, M. J. (1973a). Identification of biological cascades of linear and static nonlinear systems. Proceedings of MidwestSymposium on Circuit Theory,
18, l-9.
Korenberg, M. J. (1973b). Cross-correlational analysis of neural cascades. Proceedings of the Annual Rocky Mountain Bioengineering Symposium,
I, 47-52.
Korenberg, M. J. (1982). Statistical identification of parallel cascades of linear and non linear systems. IFAC Symposium Identification and System
Parameter
Estimation,
Estimation,
Schetxen, M. (1986). Differential models and modeling. International Journal of Control, 44, 157-179.
Skottun, B. C., De Valois, R. L., Grosof, D. H., Movshon, J. A., Albrecht, D. G. & Bonds, A. B. (1991). Classification of simple and complex cells on the basis of response modulation. Vision Research, 31, 1079-1086.
Spitxer, H. & Hochstein, S. (1985). A complex-cell receptive field model. Journal of Neurophysiology, 53, 1266-1286. Tolhurst, D. J. & Dean, A. F. (1990). The effects of contrast on the linearity of spatial summation of simple cells in the cat’s striate cortex. Experimental Brain Research, 79, 582-588. Ts’o, D. Y. &Gilbert, C. D. (1988). The organization of chromatic and spatial interactions in the primate striate cortex. Journal of Neuroscience, 8, 1112-1727. Victor, J. D., Shapley, R. M. & Knight, B. W. (1977). Nonlinear analysis of retinal ganglion cells in the frequency domain. Proceedings of the National
I, 421-426.
Korenberg, M. & Hunter, I. W. (1986). The identification of nonlinear biological systems: LNL cascade models. Biological Cybernetics, 55,
Lennie, P., Krauskopf, J. & S&r, G. (1990). Chromatic mechanisms in the striate cortex of macaque. Journal of Neuroscience, IO, 6499669. Liu, Z., Gaska, J. P., Jacobson, L. D. & Pollen, D. A. (1992). Intemeuronal interaction between members of quadrature phase and anti-phase pairs in the cat’s visual cortex. Vision Research, 32, 1193-I 198.
Maffei, L. St Fiorentini, A. (1976). The unresponsive regions of visual cortical receptive fields. Vision Research, 16, 113l-l 139. Marmarelis, P. Z. & Marmarelis, V. Z. (1978). Analysis ofphysiological systems: The white noise approach. New York: Plenum Press. Movshon, J. A., Thompson, I. D. & Tolhurst, D. J. (1978a). Spatial summation in the receptive fields of simple cells in the cat’s striate cortex. Journal of Physiology, London, 283, 53-77. Movshon, J. A., Thompson, I. D. & Tolhurst, D. J. (1978b). Receptive field organization of complex cells in the cat’s striate cortex. Journal London, 283, 79-99.
Nelson, J. I. & Frost, B. J. (1978). Orientation-selective inhibition from beyond the classic visual receptive field. Brain Research, 139, 359-365.
Palmer, L. P. & Davis, T. L. (1981). Receptive-field structure in cat striate cortex. Journal of Neurophysiology, 46, 260-295. Pollen, D. & Ronner, S. (1983). Visual cortical neurons as localized spatial frequency filters. IEEE Transactions on Systems, Man, and Cybernetics,
13, 907-916.
Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1988). Numerical recipes in C: The art of scientific computing. Cambridge: Cambridge University Press. Ohzawa, I., Freeman, R. D. & DeAngelis, G. C. (1990). Stereoscopic depth discrimination in the visual cortex: Neurons ideally suited as disparity detectors. Science, 249, 1037-1041. Reid, R. C. Br Shapley, R. M. (1990). Spatial and temporal characteristics of cone inputs to macaque LGN cells as mapped by pseudorandom stimuli. Investigative Ophthalmology and Visual Science (Suppl.),
of Science,
U.S.A.,
74, 3068-3072.
are indebted to Dr Peter Lennie for providing us with information about sufentanil anesthesia prior to his publication (Lennie, Krauskopf & Sclar, 1990). This work was supported by NIH grant EY05156 and Air Force contract AFOSR-89-0247.
Acknowledgements-We
125-134. L.-x, Y. W. & Schetzen, M. (1965). Measurement of the Wiener kernels of a nonlinear system by cross-correlation. International Journal of Control, 2, 237-254.
of Physiology,
Academy
Wiener, N. (1958). Nonlinear problems in random theory. New York: MIT Press and Wiley. Wong-Riley, M. T. T. (1979). Changes in the visual system of monocularly sutured and enucleated cat demonstrated with cytochrome oxidase histochemistry. Brain Research, 171, 1 l-28.
I, 580-585.
Korenberg, M. J. (1985). Identifying noisy cascades of linear and static nonlinear systems. IFAC Symposium Identification and System Parameter
625
LN CASCADE MODELS
31, 429.
Reid, R. C., Victor, J. D. I.%Shapley, R. M. (1992). Broadband temporal stimuli decrease the integration time of neurons in cat striate cortex. Visual Neuroscience, 9, 39-45. Schetxen, M. (1980). The Volterra and Wiener theories of nonlinear systems. New York: Wiley.
APPENDIX Noise Statistics
of the LN Prediction
The LN second-order
prediction was defined in equation (1) as the outer product of a cell’s first-order cross-correlation function with itself. However, owing to the finite duration of a stimulus-response study, one can only obtain an estimate of the true first-order cross-correlation function of a cell. Therefore, in practice, the computed LN prediction is corrupted by noise whose statistics depend on the estimated first-order cross-correlation and on the nonlinear outer product operation. The mean and variance of the LN prediction are needed to compute the LN test statistic Z,, of a cell [equation (3)] and these statistics are therefore derived below. In the present experiments, the estimate of a cell’s first-order cross-correlation function, denoted by k, is a three-dimensional discretely sampled function which we shall regard as a random function I;(x,,Yi,r,)=h(x,,Y,,7k)+n(x,,yj,t~),
(8)
where the estimated first-order cross-correlation has been expressed as the sum of the true cross-correlation (h) plus a random error function (n),andwhere~,,y~andr,,O
+n(xi,,yj,,~k,Il[h(X~ry~,~~)+n(x~,y~,~~2)l.(9) By detinition, the off-diagonal part of the LN prediction consists of samples of equation (9) where (xi,, yj,, rk,) # (x,, yfi, Tag) and the on-diagonal part of the LN prediction consists of all samples where
txi,Yja,Tkt) = Cxi,Y 9
9
9
?kr).
An arbitrary o14” -diagonal sample of the LN mediction will be denoted by (s,.+ n;)(s, % nz) where s, and s, are samples of the cell’s true first-order cross-correlation function, and n, and n2 are independent zero-mean Gaussian random variables with variances a:, and at, respectively. An arbitrary on-diagonal sample of the LN prediction will be denoted by (s + n)2, obtained by specializing the off-diagonal result with the substitution s = s, = s,, and n = n, = n2. Given the above definitions, we now derive the mean and variance of the LN prediction function.
626
LOWELL
D. JACOBSON
Mean h) The mean at an arbitrary is given by
off-diagonal
sample of the LN prediction
~,R=E{(s,+n,)(S*+n2)) and the mean at an arbitrary is given by
=stsz>
on-diagonal
(10)
sample of the LN prediction
~,,=E{(s+n)2}=sZ+.a,
(11)
where ui is the variance of the random (noise) variable n. Therefore, using definition (9) with results (10) and (11) the mean p,, of the random function associated with the LN prediction is the space-time-varying function I(LN(I,,
h(x,,, Y,,, tk,V (x,,, v,,, QJ off-diagonal i e,,
>Y,,9%,I +
on-diagonal
4
Variance (at,)
&
u~,=(S:+.r;)u~+u;t. Similarly, the variance prediction is given by
at an arbitrary
(14) on-diagonal
sample of the LN
= E{(s + n)4} -a;,
(12)
where h was previously defined to be the cell’s true first-order cross-correlation function, and CT:is the variance of the noise process (assumed additive and space invariant) that corrupts the experimental estimate of the first-order cross correlation function.
The variance at an arbitrary tion is given by
where, as previously defined, ui, and ua* are the variances of the zeromean random (noise) variables n, and nZ, and poR was derived in equation (10). We next invoke the previously stated assumption that the statistical properties of the noise function n(x,, y,, rk) that corrupts the first-order cross-correlation function are space-time invariant. Hence the random variables n, and n2 will have equal variances uz = ui, = ui,. Using this assumption in equation (13), the variance of an arbitrary off-diagonal sample from the LN prediction becomes
6:” = E{[(s + n)’ - &“12}
3Y,,3Tk,) x,, >Y,23%I zz
et ul
off-diagonal
= E{[(s, + n,)(s2 + n2) -
sample of the LN predic-
= 4s’ui,
(15)
where ui is the variance of the zero-mean random (noise) variable n, and pO” was derived in equation (11). Therefore, using definition (9) with results (14) and (15), the variance uiN of the random function associated with the LN prediction is the spaceetime-varying function u:N(X,,,y,,‘Tk,,X,?,y,2’~kl)
Pods
[h’(x,, . y,, , Q,)+~?x,~,
=E{[(s,+n,)(s,+n,)l*}-~~,
= i 4h’(x,,>L;,,rk,)u:
=E{[s,s*+s,n,+szn,+n,nz]‘}-(S,S~)* = s:E{n;} + s;E{n:} + E{n:}E{n;} = s2u2 I I,:+ s;u2n,+ CT2 n,u2“!’
=E{(s+n)4}-(s’+u;)~
(13)
y,,, Q~)]u~+u~
off-diagonal on-diagonal
(16)
where h was previously defined to be the cell’s true first-order cross-correlation function, and uf is the variance of the noise process (assumed additive and space-invariant) that corrupts the experimental estimate of the first-order cross-correlation function.