A model for the receptive field of retinal ganglion cells

A model for the receptive field of retinal ganglion cells

Neural Networks 49 (2014) 51–58 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet A model ...

862KB Sizes 1 Downloads 157 Views

Neural Networks 49 (2014) 51–58

Contents lists available at ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

A model for the receptive field of retinal ganglion cells Myoung Won Cho a,∗ , M.Y. Choi b a

Department of Global Medical Science, Sungshin Women’s University, Seoul 142-732, Republic of Korea

b

Department of Physics and Astronomy and Center for Theoretical Physics, Seoul National University, Seoul 151-747, Republic of Korea

article

info

Article history: Received 8 March 2012 Received in revised form 2 September 2013 Accepted 18 September 2013 Keywords: Receptive field model Visual information processing Sensory system

abstract Most retina ganglion cells have center–surround receptive fields, where the center may be either ON or OFF while the surround is the opposite. We clarify the functional roles of the receptive field structure, on the basis of the modern theory of natural data processing. It is suggested that the retina shares the principal mechanism and performance of image processing with a video codec in computers, where the antagonism in spatial or temporal receptive fields originates from the orthogonality condition between linear filters for optimal coding of visual signals. We also reveal what visual information is multiplexed across the discharges of an ensemble of ganglion cells. Our theory makes it possible to predict the crosscorrelations between ganglion cell spikes, which are optimized for LGN cells to respond accurately and quickly to their receptive fields. © 2013 Elsevier Ltd. All rights reserved.

1. Introduction The retina transmits visual signals from a neural population of 108 photoreceptors into the lateral geniculate nucleus (LGN) via 106 optic nerve fibers of the retinal ganglion cells (RGCs) (Field & Chichilnisky, 2007; Meister & Berry, 1999; Troy & Shou, 2002). The visual responses of individual visual neurons have been studied in detail, which raises both our understanding and debating about how visual information is processed and conveyed from the eye to the brain. An important issue in the early visual system is what the functional role of the center–surround cell receptive fields is. A common belief is that the receptive fields work as differential operators (Shanmugam, Dickey, & Green, 1979; Troy, 1993). The structure may allow ganglion cells to convey the information about discontinuities in the distribution of light falling on the retina; these often specify the edges of objects. The center–surround receptive fields are often modeled quantitatively as the difference of a Gaussian (DOG) function, which can have different functional characteristics depending on the ratio between parameters. The DOG function approximates well the Laplacian of a Gaussian (LOG) function, being near optimal for the task of revealing edges in images when the surround field has 1.6 times the spread of the center field. Note, however, that the data on RGC receptive fields give a factor larger than 1.6 (Marr, 1982; Marr & Hildreth, 1980). The idea of differential operators is also inconsistent with the



Corresponding author. Tel.: +82 2 920 7280; fax: +82 2 920 2091. E-mail addresses: [email protected] (M.W. Cho), [email protected] (M.Y. Choi). 0893-6080/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.neunet.2013.09.005

considerable cell responses to spatially uniform visual stimuli, thus giving rise to different views on the belief. A second proposal is that the function of the surround is to pool signals from receptors over a reasonably wide area and to predict the local average luminance. The prediction would help to maximize the signal amplitude in relation to noise (Atick & Redlich, 1990; Srinivasan, Laughlin, & Dubs, 1982; Tsukamoto, Smith, & Sterling, 1990). A third proposal is that the center–surround receptive fields are designed to eliminate correlations between the messages carried by individual retinal ganglion cells (Atick & Redlick, 1992). The different types of ganglion or LGN cell also make the problem more confusing (Li, 1992). Although both P and M cells in monkeys (or X and Y cells in cats) have center–surround receptive fields; however, P (or X) cells respond approximately linearly to inputs while M (or Y) cells respond linearly only to stimuli of low spatial frequencies. Further, P cells are color-opponent while most M cells show little spectral selectivity. Another important issue is that the correlations between two or more ganglion cell spikes are so strong. It was assumed that each ganglion cell responds to the stimulus within its receptive field and then transmits that information to the next visual stage independently of other ganglion cells; however, a series of experiments in various species show that nearby ganglion cells tend to fire together in synchrony over different time scales (Arnett, 1978; Mastronarde, 1989; Meister, Lagnado, & Baylor, 1995). It is often regarded that the redundancy in retinal coding may originate just from the natural correlations in impinged visual images or sharing electrical inputs from interneurons in the inner retina (Mukamel & Schnitzer, 2005; Ostijic, Brunel, & Hakim, 2009). Nevertheless, it seems that the internal circuitry in the retina is designed to enhance the synchrony between nearby ganglion cell firings more

52

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

than the unavoidable factors do. The phenomenon is also contradictory to the ‘redundancy reduction’ hypothesis, which states that a visual neuron should remove correlations from an image to reduce redundancy in the spike train, thus increasing the efficiency of information coding (Barlow, 1961). A proposal about the functional role is that the synchrony targets to convey more information than the information available by treating them independently (Meister, 1996; Meister et al., 1995). For example, the activity of two ganglion cells, R1 (t ) and R2 (t ), can carry the three coefficients represented by the moving averages: ⟨R1 ⟩–⟨R1 R2 ⟩, ⟨R2 ⟩–⟨R1 R2 ⟩ and ⟨R1 R2 ⟩. It is proposed that the multineuronal codes may be used to increase the spatial resolution of visual signals. This paper addresses that the answer of the two issues could be entangled with each other. We interpret the functional role of the spatial and temporal receptive fields of visual neurons, on the basis of modern techniques of (visual) signal processing. It is argued that P/X cells may target to perform very efficient reduction of visual signals while M/Y cells may detach emergently irregular features in visual scenes based on the assumption that their receptive field follows the nature of low-pass and band-pass filters, respectively, in wavelet theory. Finally, we derive analytically the typical characteristics of the cross-correlations between RGC spikes based on the similarity between LGN and retina receptive fields and the properties of wavelet basis functions. 2. Preliminaries The linear–nonlinear (LN) model provides a convenient scheme to predict the responses of RGCs to visual stimuli or the reverse (Karklin & Simoncelli, 2011; Rodieck, 1965; Sakuranaga & Naka, 1985; Victor, 1987). In the model, discharges of a single ganglion cell are born of a nonlinear transformation for the convolution of visual stimuli through a linear filter. We consider the convolution process to associate with the transform coding, which is a type of data compression for natural data such as video signals, audio signals, or photographic images. The key idea of the lossy compression algorithm is to ignore less important coefficients after transforming data into another coordinate by convolving them through linear filters. While audio and still images can usually be compressed at the ratio 10:1 with imperceptible quality loss, the compression ratio of a lossy video codec, taking the value, e.g., 100:1, is almost always far superior to that for the audio and still-image equivalents (Mallat, 1999). It is noticeable that the compression ratio of a video codec is similar to the ratio of the number of photoreceptors to that of optic nerve fibers. The properties of a transform coding are determined mainly by its filters employed. In the original JPEG standard, an example of the transform coding, the two-dimensional discrete cosine transform (DCT) is used (Wallace, 1991). Most principal components take after the cosine basis functions with lower frequencies when a natural image is partitioned by rectangular window functions. If the image is partitioned by Gaussian functions, the principal components would follow the Garbor functions, used to model the receptive fields of simple cells in V1 (Daugman, 1980; Marcelja, 1980). Meanwhile, some formats such as the JPEG-2000 standard use the discrete wavelet transform (DWT) for processing natural data (Skodras, Christopoulos, & Ebrahimi, 2001). The DWT possesses excellent signal compaction properties for many classes of realworld signal while being computationally very efficient, and has been applied to various technical fields including image compression, image denoising, image enhancement, and pattern recognition (Mallat, 1999; Mertins, 1999). It is thus conceivable that the signal processing in the early visual areas may also relate to such a wavelet-based mechanism. Indeed the receptive field structure of a visual cell is often modeled as the LOG wavelet

function; the hierarchical processing in the early visual system is also reminiscent of a multiresolution analysis (MRA) of visual signals, based on DWT. Such use of operators of different sizes is necessary for detecting properly detailed features or edges because intensity changes occur on different scales in an image. Specifically, there are two different types of filter in the DWT, which are called scaling functions and wavelets, respectively. A scaling function and its corresponding wavelet(s) have the ability to decompose an empirical signal into different quantities. While the convolution with wavelets tends to detach irregularities, discontinuities, or fluctuations in the signal, that with scaling functions captures the remains which are regular or smoothly varying components in the signal. Scaling functions are sometimes referred to as averaging filters because of their low-pass nature in the frequency domain. Fig. 1 illustrates the Haar scaling function and wavelet, the simplest wavelet bases. It is shown that the product of the Haar scaling function (or wavelet) and a step function results in finite values during on timings (or at onset and offset timings) owing to its nature of the averaging (or differential) operator. Usually, wavelet bases yield similar convolution results for the step function. From this, we presume that the temporal filter of a P/X (or M/Y) cell may be modeled by a scaling function (or wavelet) (Enroth-Cugell & Robson, 1966). However often the spatial receptive fields of RGCs may be regarded as edge sharpening filters, i.e., wavelets, it is difficult to clarify their attributes based simply on their appearances, for the center–surround opposite structure is a common characteristic of scaling functions and wavelets. While Gaussian-like functions were used as low-pass filters for image processing in engineering (Adelson, Anderson, Bergen, Burg, & Ogden, 1984), DOG-like functions are adopted as more efficient ones, with the orthogonality condition considered. The inner product of filters with different centers may vanish when they have non-overlapping concerns like the translated versions of the Haar scaling (or a rectangular) function or have opposite signs in the center and surrounds. Such center–surround structure, which is found also in many wavelets, e.g., the LOG wavelet (cf. Fig. 3(B)), has its origin in the nature of differential operators. A wavelet ψ should have p vanishing moments, i.e., dt t n ψ(t ) = 0 for 0 ≤ n < p when it operates as differentiation of order p. Note that the center–surround structure provides a potential solution of ψ(t ) when it has vanishing zeroth and first moments. Here the number of vanishing moments is one of the essential characteristics of wavelets. For example, primal and dual wavelets of Cohen–Daubechies–Feauveau (CDF) 9/7, used in the JPEG-2000 standard, possess four vanishing moments. Both the CDF 9/7 scaling function and its wavelet also exhibit the center–surround structure, as shown in Fig. 2(A) and (B). 3. Receptive field model Suppose the response of retina P cell a at position r′a and time ta′ is described by the LN model: R(r′a , ta′ ) = R¯ + g c (ra , ta )





(1)

with baseline firing rate R¯ and gain function g, as in the leaky integrate-and-fire model. Here ra = (xa , ya ) specifies the receptive field center of cell a, and ta the reference timing. The difference between the firing timing ta′ and the reference timing ta is assumed to be the same. Under external (or target) stimulus S (r, t ) to the photoreceptor cell at position r = (x, y) and time t, in addition to ¯ the coefficient takes the form the base (or non-target) stimulus S, c (ra , ta ) =



 dr

dt A(r − ra )B(t − ta ) S (r, t ) − S¯ ,





(2)

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

53

Fig. 1. (A) Haar scaling function φ(t ) and wavelet ψ(t ), together with their filtering results c (t ) = dt ′ φ(t ′ − ⌊t ⌋)S (t ′ ) and d(t ) = dt ψ(t ′ − ⌊t ⌋)S (t ′ ), where ⌊t ⌋ denotes the largest integer not greater than t, for the given signal S (t ). (B) Illustrations of typical responsibilities: (upper) sustained responsibility for a P (or X) OFF cell and (lower) transient responsibility for an M (or Y) OFF cell, which demonstrate that the temporal filters of P/X and of M/Y cells follow the nature of the scaling function and of the wavelet, respectively.





Fig. 2. One-dimensional wavelet bases: (A) CDF 9/7 scaling function and (B) wavelet, which are used in the JPEG-2000 standard. (C) DB4 scaling function and (D) negative of the Beta wavelet with (p, q) = (5, 17).

Fig. 3. (Upper) (A) Quincunx scaling function with eight vanishing moments and (B) negative of the LOG wavelet of a similar size. (Lower) Corresponding behaviors of c (ω),  the maximum value of dxdyA(x, y)S (x, y) with S (x, y) = sin(ωx + θ). Here the given wavelet bases for A(x, y) produce the black curves; the red curve results from the choice A(x, y) = A(+) (x, y) + rA(−) (x, y), where A(±) (x, y) is the positive/negative part of the quincunx scaling function, respectively. The value of r has been chosen to give the ratio WG /WR ≈ 2 in the representation of luminance Y by the weighted sum of linear RGB components: Y = WR R + WG G + WB B. (C) Spatial frequency tuning curves, drawn schematically, for a typical retina or LGN red–green neuron under external isoluminant (black) or luminant (red) stimuli. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

where A(r) and B(t ) represent spatial and temporal linear filters, respectively.

There are some grounds for modeling both linear filters A(r) and B(t ) by scaling functions, namely, F (r, t ) ≡ A(r)B(t ) deserves to be

54

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

a three-dimensional (3D) scaling function: one of them based on the similarity between the receptive field of LGN and retina P cells. Suppose the response of LGN P cell b at position r′′b and time tb′′ is also described by RLGN (r′′b , tb′′ ) = R¯ LGN + g LGN c LGN (ra , tb ) ,





(3)

where the coefficient is given by the inner product between the receptive field of LGN cell b and the external stimulus: c LGN (rb , tb )

 =

 dr

dt ALGN (r − rb )BLGN (t − tb ) S (r, t ) − S¯ .





(4)

Ignoring nonlinearity in the gain function, we represent the coefficient of a LGN cell as a weighted sum of the coefficients of RGCs c LGN (rb , tb ) ≈



α(rb , tb ; ra , ta )c (ra , ta )

(5)

r a ,ta

since LGN cells may make their responses through kerneling (or convoluting with) signals from RGCs. With Eqs. (2) and (4) satisfied for arbitrary stimulation S, the receptive field of a LGN P cell should also be expressed as a weighted sum of the receptive fields of RGCs at different centers and obtain the form ALGN (r − rb )BLGN (t − tb )





α(rb , tb ; ra , ta )A(r − ra )B(t − ta ).

(6)

ra ,ta

Meanwhile, the receptive field of LGN P cells is larger than but similar to that of retina P cells; this is expressed as LGN

A

(r)B

LGN

(t ) ≈ A(σ

−1

r)B(ζ

−1

t ),

(7)

where σ and ζ correspond to the ratio of the size of LGN to that of the retina receptive field in the spatial and temporal spaces, respectively. Finally, we find the self-similarity of the receptive field structure, expressed in terms of the dilation equation A σ −1 (r − rb ) B ζ −1 (t − tb )





 





α(rb , tb ; ra , ta )A(r − ra )B(t − ta ).

(8)

r a ,ta

Functions satisfying such self-similarity, dubbed refinable in mathematics, play a fundamental role in wavelet theory as scaling functions. Other experimental observations disclose convincingly that the receptive field of a P/X retina cell carries the nature of scaling functions. First of all, as for the spatial filter A(r), the RGCs make responses when  they experience spatially uniform visual stimuli, namely, when dr A(r) does not vanish. It is the main difference between scaling functions and wavelets whether the zeroth moment vanishes or not. Furthermore, appearance of the spatial filter in the frequency domain manifests the low-pass nature. It would decay towards low frequencies if it had the nature of wavelets. Fig. 3 displays the distribution of a quincunx scaling function and negative of an LOG wavelet (Kovaěvić & Swelden, 2000), together with the corresponding responses. We choose the quincunx scaling function as a model for P cell receptive fields because such a non-separable wavelet basis has more isotropic structure than a separable wavelet basis or the two-dimensional (2D) CDF 9/7 scaling function. There are other non-separable 2D wavelet bases as substitutable models, which have more isotropic and smoothly varying structures. However the quincunx scaling function and the LOG wavelet may share the center–surround structure, they show distinctive distributions in the low-frequency regime. In the lower graphs, black and red curves represent roughly the responses of a P cell to isoluminant and luminant visual

stimuli, respectively, when its receptive field is described by the corresponding model function. For isoluminant stimuli, the results correspond nearly to the Fourier transform of the receptive fields. Such low-pass filtering and band-pass filtering are characteristics of scaling functions and wavelets, respectively. On the other hand, for luminance stimuli, the bipolar cells receiving signals from Mcones activate more sensitively than those from L-cones, which leads to an increase in responsibility at a considerable frequency. These results accord well with the response behaviors of retina and LGN P cells observed in experiment (Gegenfurtner & Kiper, 2003), the typical results of which are plotted schematically in Fig. 3(C). Similarly, the temporal filter B(t ) of a P cell can be modeled by a one-dimensional (1D) scaling function, in view of the fact that it makes sustainable responses to uniform stimuli. Meanwhile, for an M cell, the temporal filter should be modeled by a wavelet because it makes transient responses at stimulus onset and offset timings. In Fig. 2(C) and (D), we show the Daubechies 4 (DB4) scaling function (Daubechies, 1988) and the negative of the Beta wavelet (Bellil, Amar, & Alimi, 2003), as the models for the temporal filters of P and M cells, respectively. They have been selected from mathematically well-defined wavelet bases with asymmetric distributions, in comparison with experimental observations (Baccus & Meister, 2002; Kaplan & Benardete, 2001). We can estimate the functional roles of P cells in visual signal processing based on the assumption that A(r) and B(t ) possess the nature of scaling functions with ca = c (ra , ta ) corresponding to the scaling function coefficients: First, they could send an impinged visual signal to LGN cells compressively with little quality loss. The original visual signal can be reconstructed from the discharges of retina P cells in the following way: S (r, t ) ≈ S¯ +



c (ra , ta ) A(r − ra ) B(t − ta ),

(9)

ra ,ta

where  A(r) and  B(t ) are the duals of A(r) and B(t ), respectively, satisfying the orthogonality condition dr A(r − ra ) A(r − rb ) = δab  and dt B(t − ta ) B(t − tb ) = δab for grid points in the spatial and temporal spaces. Fig. 4 illustrates the reconstructed images from the coefficients of wavelet bases. While the reduced image by the Haar scaling function is obtained just by taking the average over individual 8 × 8 pixels, the image by the quincunx scaling function has a more smoothing combination between patches owing to overlapping between the translated versions of the scaling function. The quincunx scaling function attains a performance of image compression similar to that of the 2D CDF 9/7 scaling function. The JPEG-2000 standard usually represents a still image by multi-step MRA, composed of scaling-function coefficients on a large scale and wavelet coefficients on several scales being equal to or less than that of scaling-function coefficients, owing to the virtue in computation time. If retina P cell discharges convey only the scaling-function coefficients on a certain scale, they would be the zero-step MRA of visual signals; still the retina does not lose the virtue because of the parallel computation based on synaptic connections. Note that the resolution of L-step MRA depends not on the number L. Second, P cells could send the visual signals after reducing noise or enhancing some details. The data reduction by the retina could rather improve the quality of conveyed visual signals by eliminating the noisy part on small scales. In addition, the retina could increase the distinction of images, on the basis of the nonlinearity in the gain function g. Such noise reduction and edge enhancement by the retina, advantages in the application of DWT, would increase the performance of pattern recognition or image processing in a wider visual area. Third, it is possible to compute the scaling-function and wavelet coefficients on a larger scale by means of weighted summations of the retina P cell discharges, where the wavelet coefficients would represent edges, movements, etc. in visual signals (see Eq. (A.10) in the Appendix).

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

55

Fig. 4. Synthesized images from (A) Haar and (B) quincunx scaling-function coefficients, with the compression ratio 64:1. (C) One-step MRA, based on the 2D CDF 9/7 wavelet bases. The images have been synthesized from the coefficients of the single scaling function and three wavelets with the compression ratio 256:1. While the scaling function leads to a reduced image (upper left) with the compression ratio, the individual wavelets reveal horizontal (upper right), vertical (lower left) and diagonal (lower right) details, respectively, on the scale. A reduced image with ratio 64:1, being similar to the image in (B), can be obtained from the direct summation of the four images according to the decomposition relation of wavelet bases.

4. Cross-correlations between RGC responses

where C0 measures the cross-correlations between independently firing ganglion cells, with

The receptive field model also discloses why RGCs have the tendency to fire in chorus. Suppose LGN cell b can estimate its coefficients based on a weighted sum of RCG discharges: cbLGN

=



Wba

 Ra − R¯ ,



where Wba is the synaptic strength from RGC a to LGN cell b. Substituting Eq. (5) into Eq. (10), we obtain the temporal profile of RGC discharges in the form



cb αba ca , W

(11)

b ,a

 is the dual (or the inverse) of W satisfying the orthogowhere W  cb Wba = δca . nality condition b W The temporal profile makes it possible to predict the crosscorrelations between two ganglion spikes according to Ra Rc = R¯ 2 + R¯









Gab + Gcb ⟨cb ⟩

b

+



Gab Gcd ⟨cb cd ⟩

(12)

b ,d

 with Gab ≡ c Wac αcb . Here, under random stimulation, ⟨cb ⟩ = ⟨c (rb , tb )⟩ would be almost constant and ⟨cb cd ⟩ = ⟨c (rb , tb )cd (rd , td )⟩ would mostly decrease monotonically with the distance |rbd | ≡ |rb − rd | and the difference |tbd | ≡ |tb − td |. From Eq. (8), the coefficient αac obtains the form 

αba =



drA σ −1 (r − rb )  A(r − ra )



 ×



dtB ζ −1 (t − tb )  B(t − ta ),





(13)

which converges towards A(σ −1 rac )B(ζ −1 tac ) with increasing σ and ζ , via the cascade algorithm (see the Appendix or Ref. Karoui & Vaillancourt, 1995). Meanwhile, it is possible for an LGN cell to respond to the weighted sum of RGC discharges, based on the synaptic strength Wba and on the exact relative firing timing (see Fig. 5(A)). Assuming ab ) is nearly the same for each RGC a being that Wba (or W the receptive field of LGN cell b, we obtain cross-correlations depending on the distance r between receptive fields and the relative firing timing t:

 C (r , t ) =

a ,c

δ(r − |rac |)δ(t − tac )⟨Ra Rc ⟩  = C0 + C1 g (r )ξ (t ), (14) δ(r − |rac |)δ(t − tac ) a ,c

g (r ) =

r,r′

δ(r − |r|)ALGN (r + r′ )ALGN (r′ ) 

(10)

a

Rc ≈ R¯ +



δ(r − |r|)

r

 r′

|ALGN (r′ )|2

(15)

and

 ξ (t ) =

t′

BLGN (t + t ′ )BLGN (t ′ )

 t′

|BLGN (t ′ )|2

.

(16)

Note that g (0) = ξ (0) = 1. In addition, the mean probability for observing synchronous firings within duration 1t reads C (r , 1t ) = C0 + C1 g (r )ξ (1t ),

(17)

 1t where X (1t ) ≡ (1t )−1 0 dtX (t ) represents the time average. Fig. 5(B) shows the behavior of C (r , 1t )/C0 with r, in case that A(r) is modeled by the quincunx scaling function in Fig. 3(A). It is found that the (mean) cross-correlation function decreases with r, reaching the minimum value below unity, and recovers eventually the value unity. The characteristics, especially the existence of the asynchronous region, are consistent with experimental observations (Meister,  1996; Meister et al., 1995), reproduced in Fig. 5(C). Note that dr′ A(r + r′ )A(r′ ) also makes a DOG-like function of r when A(r) is a DOG-like function. Finally, Fig. 5(D) exhibits C (r , t ) versus t for B(t ) modeled by the DB4 scaling function in Fig. 2(C). It also shows a decrease in correlations with t, existence of the asynchronous region, and recovery to the basal correlations. As shown in Fig. 5(E), similar characteristics were observed in experiments with bright stimuli while the asynchronous region becomes indistinct for weak stimuli (Field & Chichilnisky, 2007; Meister et al., 1995). Theoretically, characteristics of the cross-correlations depending on the distance between receptive fields and the relative firing timing are determined mainly by the form of ALGN (r) and that of BLGN (t ), respectively, while the natural correlations in external visual stimuli tend to enhance the synchrony. In particular, the asynchrony between partial ganglion cell spikes originates from the antagonistic behavior in the receptive fields of LGN cells. 5. Discussion In this paper, we have argued that the basis functions in DWT are adequate to model the spatial and temporal receptive

56

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

Fig. 5. (A) Illustrations of independent and synchronous RGC firings. The receptive field of a LGN cell is determined by the weighted sum of RGC discharges, where the weight depends not only on the synaptic strength Wba but also on the exact spike timing. It is more proper for the LGN cell to make responses exactly and quickly when a group of spikes arrive within a short time interval. (B) Plot of C (r ; 1t )/C0 versus r for C1 ξ (1t )/C0 = 15. (C) Strength of correlations between two cells plotted versus the distance between their receptive fields, determined under periodic flash stimulation. The stimulus consisted of full-field illumination with white light, square-wave modulated with the period of 1 s. For each pair of cells, the correlation index expresses the observed rate of synchronous firings (i.e., successive firings of the spike delay within 1t = 0.02 s) divided by the rate expected for independent firings of the two cells. (D) Plot of C (r , t ) versus t for C0 = 4 and C1 g (r ) = 17. (E) Cross-correlation function between two ganglion cell spike trains under random flicker stimulation, performed at various mean intensities. Source: (C) and (E) Adopted from Meister et al., 1995. Reprinted with permission from AAAS.

fields of early visual neurons. A DOG function can work as if it were an averaging filter of low-pass nature or a differential operator of band-pass nature, depending on its parameters. In comparison, the wavelet bases, if employed by visual cells, provide a number of distinctive benefits in visual information processing. (1) The orthogonality condition of linear filters helps to restore the original signals, as shown in Eq. (9), even if the brain may not task necessarily the reconstruction process. Rather, the orthogonality condition helps visual neurons to increase the capacity of transmitted information through minimizing overlapped information between responses. To speak precisely, from the viewpoint of the independent component analysis (ICA), adoption of non-orthogonal basis functions is sometimes more suitable for extracting meaningful features, depending on the characteristics of signals. Nevertheless, orthogonal basis functions are selected mostly for visual signal processing in engineering, because of natural image statistics. (2) Distortion of the restored signal could be minimized in the presence of not only input noise but also output noise. It has been much discussed what filters are optimal under noisy environments for visual neurons or image compression algorithms. Such optimal filters are expected to minimize the noise sensitivity by maximizing mutual information between input and output signals (Karklin & Simoncelli, 2011; Mallat, 1999). Here a common solution is to assign thresholds (or quantization) of signals via processing through nonlinear filters after convolution with linear filters. Visual neurons are expected

to acquire such ability naturally from nonlinearity in the gain function. In particular, wavelet bases have advantages in noise reduction and edge enhancement due to their low-pass and band-pass nature. Another solution, suggested for compression algorithms, is based on optimizing bit (or code) allocation. It is also possible for retina cells to have linear filters of different scales and characteristics in order to encode visual signals more optimally; however, this possibility should be considered on the basis of more vast and detailed experimental results. (3) Joint activities of cell discharges convey naturally multiplexed signals according to the decomposition relation of wavelet bases (see the Appendix). This property would constitute the root of the enlarged receptive fields in LGN P cells or the feature extractions by visual cortical cells. We further estimate the cross-correlations between RGC discharges, with the help of the similarity between LGN and retina receptive fields and the properties of scaling functions, which show the observed peculiar characteristics of the cross-correlations. It was suggested that synchrony may function to increase the spatial resolution in transmitted visual signals (Meister, 1996; Meister et al., 1995); without the orthogonality between filters, however, supplementary signals would not guarantee the improvement of the resolution as much as the increased number of coefficients. The resolution is doubled more properly when multiplexed signals convey wavelet coefficients as well as scaling-function coefficients on the same scale. We have focused mainly on clarifying the functions of P/X cells, based on the simple LN model. The retina would work

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

properly even if it has ganglion cells of a single type which transmit 3D scaling-function coefficients. With the size of receptive fields considered, most other types of ganglion cells may not increase the resolution of visual signals, namely, the visual information conveyed via the M pathway can be extracted from the fine details conveyed via the P pathway. Nevertheless, visual signals conveyed independently would assist the brain to recognize an emergent situation more quickly. The receptive fields of M/Y or some W cells are expected to be modeled as wavelets; however, it would be formidable to infer them directly because of the variety of applicable types and combinations of wavelets. For example, a separable 3D wavelet family has seven wavelets in three classes, which can detach moving spots (or edges of objects), moving lines, and moving areas, respectively. The number of feasible wavelet types grows even more, if we consider the non-separable 2D or 3D wavelet family and transforms of the coordinate system (x, y, t ). It is also difficult to apply the simple LN model to the functional behavior of other types of ganglion cell. Comparing with P/X cells, M/Y cells display more complex responses to external visual stimuli, which may be understood by synthesizing the responses of multiple subunits or filters. In consideration of their excessive activation or deactivation at onset and offset timings for temporally uniform stimuli, we expect that P/X cells may also possess an additional filter of the wavelet nature. There is an extended model, called the multi-filter linear–nonlinear model, which aims to explain such complex responses: There the neural response is determined by the nonlinear transformation of the summation over the outputs from all the filters (Gollisch & Meister, 2008). From the viewpoint of information coding, however, the multi-filter linear–nonlinear model has the problem that the individual convolution result cannot be extracted properly from its output. We hope to have an advanced model for the stimulus–response relationship in visual cells, which takes into consideration the possibility of multiplexed signaling based on the fine temporal profile or joint cell activities (McClurkin, Optican, Richmond, & Gawne, 1991; Meister, 1996; Meister et al., 1995; von del Malsburg, 1981).

(i) L2 (Rn ) is the direct sum of subspaces L2 (Rn ) = k∈Z Wk , i.e., the signal S (r) ∈ L2 (Rn ) can be represented by the summation



S (r) =

µ

k∈Z

(µ)

Tk (t ),

(µ)

where Tk (t ) =



(A.3)

(µ) a∈Ck

(µ)

(µ)

(µ)

(µ)

 (r) with d = ψ , S . Here dk,a ψ k,a k,a k,a 



(µ) (r) is the dual of ψ (µ) (r) satisfying the orthogonality condition ψ a k ,a  k,(µ)  (µ) ′ δa,a′ . The expansion becomes the continuous ψk,a , ψ = δ ′ ′ k , k k ,a

wavelet transform (CWT) of the signal. (ii) The subspace Vk contains low-pass signals and the bandwidth of the signals contained in Vk reduces with increasing m. They become a nested sequence of subspaces . . . ⊂ Vk+1 ⊂ Vk−1 ⊂ · · · ⊂ L2 (Rn ). The property leads to an approximation of the signal: S (r) ≈ Sk (r)

(A.4)

    with Sk (r) = a∈Gk ck,a φk,a (r) and ck,a = φk,a , S , where φk,a (r)  is the dual of φk,a (r) satisfying the orthogonality condition φk,a ,   φk,b = δab . The family of wavelet bases is orthogonal for φ(t ) =  (µ) (t ), and biorthogonal otherwise. (iii) Wk+1 φ(t ) and ψ (µ) (t ) = ψ is the complement of Vk+1 in Vk , i.e., Vk = Vk+1 ⊕ Wk+1 and Sk (r) =  (µ) Sk+1 (r) + µ Tk+1 (r). The recursive relation leads to the L-step 

multiresolution analysis (MRA) of the signal Sk (r) = Sk+L (r) +

L   k′ =1

µ

(µ)

Tk+k′ (r).

(A.5)

Here Sk (r), the projection of S (r) onto the subspace Vk , converges to S (r) with decreasing k regardless of L. From Vk′ ⊂ Vk and Wk′ ⊂ Vk for k′ > k, the scaling function and the wavelet on a certain scale can be expressed as weighted sums of the scaling functions on a lower scale in the following way:

φk′ ,b (r) =



H (k′ , b; k, a)φk,a (r)

(A.6)

G(µ) (k′ , b; k, a)φk,a (r)

(A.7)

a∈Gk

(µ)



ψk′ ,b (r) =

a∈Gk

(µ)

with H (k′ , b; k, a) ≡ ⟨φk′ ,b ,  φk,a ⟩ and G(µ) (k′ , b; k, a) ≡ ⟨ψk′ ,b ,

Acknowledgment This work was supported by the National Research Foundation of Korea through the Basic Science Research Program (grant numbers 2013R1A1A2012888 (MWC), 2009-0080791 and 20110012331 (MYC)).

 φk,a ⟩. The  coefficients satisfy the decomposition relation H (k′ , b; ′ ′′ ′′ ′ k, a) = c ∈Gk′′ H (k , b; k , c )H (k , c ; k, a) and G(k , b; k, a) =  ′ ′′ ′′ ′ ′′ > k. With filters c ∈Gk′′ G(k , b; k , c )H (k , c ; k, a) for k > k h and g , H (k + 1, b; k, a) and G(k + 1, b; k, a) can be written in the forms h(ra − Drb ) and g (ra − Drb ), respectively, for all k in the translation-invariant domain. Similarly, the dual scaling function and the dual wavelet can be written in the form

Appendix The DWT of a continuous signal S (r) ∈ L2 (Rn ) is calculated by convoluting it with two types of basis function:

  φk,a (r) = | det D|−k/2 φ D−k (r − ra )   (µ) ψk,a (r) = | det D|−k/2 ψ (µ) D−k (r − ra ) ,

(A.1) (A.2)

which are the scaled and translated versions of scaling function φ(r) and wavelet ψ (µ) (r), respectively. Here µ runs d − 1 elements for d-band wavelets or d-adic sampling, where d is given by the determinant of the dilation matrix D. D−k ra with a ∈ Gk (or a ∈ Ck ) runs the same grid points in Rn independently of k, where Gk (or (µ) Ck ) is the index set of translated scaling functions (or wavelets) on the scale k. The series of basis functions span two types of subspace:  (µ) (µ) Vk = span{φk,a |a ∈ Gk } and Wk = with Wk = span µ Wk (µ) {ψk,a |a

57



(µ)

∈ Ck }. These subspaces satisfy the following properties:



 φk′ ,b =

 H (k′ , b; k, a) φk,a

(A.8)

 G(µ) (k′ , b; k, a) φk,a

(A.9)

a∈Gk

(µ) ψ = k′ ,b

 a∈Gk

(µ) with  H (k′ , b; k, a) ≡ ⟨ φk′ ,b , φk,a ⟩ and  G(µ) (k′ , b; k, a) ≡ ⟨ψ , k′ ,b φk,a ⟩. The decomposition relations make it possible to compute the scaling-function and the wavelet coefficients on a certain scale from the scaling-function coefficients on a lower scale: ck′ ,b =



H (k′ , b; k, a)ck,a

(A.10)

G(µ) (k′ , b; k, a)ck,a

(A.11)

a∈Gk

(µ)

dk′ ,b =

 a∈Gk

with k′> k. Note that the equations are to be replaced by  ′  ck′ ,b = a∈Gk H (k , b; k, a)ck,a and so on, when φ and φ exchange

58

M.W. Cho, M.Y. Choi / Neural Networks 49 (2014) 51–58

the   with each other, i.e., Sk (r) =  roles of analysis and synthesis  a∈Gk ck,a φk,a (r) and ck,a = φk,a , S . The cascade algorithm allows us to construct the scaling function φ from a given filter h. A way to find the values of φ(r) is based on that the scaling function is a refinable function being an eigenfunction of the dilation operator. Eq. (A.6) at r = 0 leads to the relation

φ(−D−1 rb ) = | det D|1/2



Hba φ(−ra )

(A.12)

a∈G0

with Hba = h(ra − Drb ). From {D−1 rb |b ∈ G1 } = {ra |a ∈ G0 }, the values φ(−ra ) for a ∈ G0 can be obtained from the eigenvector of the matrix H with the eigenvalue | det D|−1/2 a∈G0 h(ra ). Then the values φ(−D−k rb ) with k > 0 are obtained from the recursive relation

φ(−D−k rb ) = | det D|1/2



Hba φ(−D1−k ra ).

(A.13)

a∈Gk−1

Another way is based on the assumption that φ may be the most dominant eigenfunction of h. From an arbitrary starting function φ (0) (r), the scaling function may be approached by the iteration

φk,b (r) ≈



Hk

 ba

φ0(0,a) (r),

(A.14)

a∈G0

(0)

whereas the choice φ0,a (r) = δ(r − ra ) leads to an approximation of the scaling function:

≈ φk,b (ra ) = | det D|−k/2 φ(D−k rab ). (A.15)   Here | det D|k/2 H k ba corresponds to αba in Eq. (13) when F (r) = φ(r) and D−k r = Z −1 r with Z = diag(σ , σ , ζ ) and r = (x, y, t ). Hk





ba

Note that some biorthogonal filters do not guarantee the stable construction of wavelet bases. Scaling functions and wavelets in high dimensions can be constructed conveniently as the products of 1D ones. For example, in one dimension, CDF 9/7 scaling functions and wavelets are obtained via the cascade algorithm with a set of biorthogonal filters {h, g ,  h, g } having nine or seven valid coefficients. They are called the wavelet family of biorthogonal 4.4 because the primal and the dual wavelets have four vanishing moments. Similarly, in two dimensions, the scaling function and three wavelets (note the  2 × 2 = 4-adic sampling with the dilation matrix D = 2 0

0 2

) are given by φ(x, y) = φ(x)φ(y), ψ (H ) (x, y) = φ(x)ψ(y),

ψ (V ) (x, y) = ψ(x)φ(y), ψ (D) (x, y) = ψ(x)ψ(y). The convolution with the wavelets ψ (H ) (x, y), ψ (V ) (x, y), and ψ (D) (x, y) detach horizontal, vertical, and diagonal details, respectively, in the visual signal. Fig. 4(C) shows the one-step MRA using the CDF 9/7 wavelet bases. Compared with a separable 2D wavelet basis, a non-separable 2D wavelet basis usually has more isotropic structure. Quincunx wavelet bases, for example, can   be constructed from the quincunx dilation matrix D =

1 1

1

−1

and the lifting scheme algorithm;

they are composed of one scaling function and one wavelet (i.e., dyadic sampling because | det D| = 2). While quincunx scaling functions can compress image data effectively, quincunx wavelets can detect sharp transitions in all directions. In particular, the lifting scheme is much more suitable for application to modeling the receptive field structure of visual cells because it allows us to build wavelet bases over grids in non-square-lattice structures or even over non-stationary grids in arbitrary domains. References Adelson, E. H., Anderson, C. H., Bergen, J. R., Burg, P. J., & Ogden, J. M. (1984). Pyramid methods in image processing. RCA Engineer, 29, 33–41.

Arnett, D. W. (1978). Statistical dependence between neighbouring retinal ganglion cells in gold fish. Experimental Brain Research, 32, 49–53. Atick, J. J., & Redlich, A. N. (1990). Towards a theory of early visual processing. Neural Computation, 2, 308–320. Atick, J. J., & Redlick, A. N. (1992). What does the retina know about neural scenes? Neural Computation, 4, 196–210. Baccus, S. A., & Meister, M. (2002). Fast and slow contrast adaptation in retinal circuitry. Neuron, 36, 909–919. Barlow, H. B. (1961). Possible principles underlying the transformation of sensory message. In W. A. Rosenblith (Ed.), Sensory communication (pp. 217–234). Cambridge, MA: MIT Press. Bellil, W., Amar, C.B., & Alimi, M.A. (2003). Beta wavelet based image compression. In International conference on signal, system and design, SSD03, Tunisia, Vol. 1 (pp. 77–82). Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 41, 909–996. Daugman, J. G. (1980). Two-dimensional spectral analysis of cortical receptive field profiles. Vision Research, 20, 847–856. Enroth-Cugell, Ch., & Robson, J. G. (1966). The contrast sensitivity of retinal ganglion cells of the cat. Journal of Physiology, 187, 517–552. Field, G. D., & Chichilnisky, E. J. (2007). Information processing in the primate retina: circuitry and coding. Annual Review of Neuroscience, 30, 1–30. Gegenfurtner, K. R., & Kiper, D. C. (2003). Color vision. Annual Review of Neuroscience, 26, 181–206. Gollisch, T., & Meister, M. (2008). Modeling convergent ON and OFF pathways in the early visual system. Biological Cybernetics, 99, 263–278. Kaplan, E., & Benardete, E. (2001). The dynamics of primate retinal ganglion cells. Progress in Brain Research, 134, 17–34. Karklin, Y., & Simoncelli, E. P. (2011). Efficient coding of natural images with a population of noisy linear-nonlinear neurons. Neural Information Processing Systems,. Karoui, A., & Vaillancourt, R. (1995). McClellan transformation and the construction of biorthogonal wavelet bases of L2 (R2 ). Computers & Mathematics with Applications, 29, 13–26. Kovaěvić, J., & Swelden, W. (2000). Wavelet families of increasing order in arbitrary dimensions. IEEE Transactions on Image Processing, 9, 480–496. Li, Z. (1992). Different retinal ganglion cells have different functional goals. International Journal of Neural Systems, 3, 237–248. Mallat, S. (1999). A wavelet tour of signal processing. San Diego: Academic Press. Marcelja, S. (1980). Mathematical description of the responses of simple cortical cells. Journal of the Optical Society of America A, 70, 1297–1300. Marr, D. (1982). Vision. San Francisco: W. H. Freeman. Marr, D., & Hildreth, E. (1980). Theory of edge detection. Proceedings of the Royal Society of London Series B, 207, 187–217. Mastronarde, D. N. (1989). Correlated firing of retinal ganglion cells. Trends in Neurosciences, 12, 75–80. McClurkin, J. W., Optican, L. M., Richmond, B. J., & Gawne, T. J. (1991). Concurrent processing and complexity of temporally encoded messages in visual perception. Science, 253, 675–677. Meister, M. (1996). Multineuronal codes in retinal signaling. Proceedings of the National Academy of Sciences of the United States of America, 93, 609–614. Meister, M., & Berry, M. J., II (1999). The neural code of the retina. Neuron, 22, 435–450. Meister, M., Lagnado, L., & Baylor, D. A. (1995). Concerted signaling by retinal ganglion cells. Science, 270, 1207–1210. Mertins, A. (1999). Signal analysis: wavelets, filter banks, time–frequency transforms, and applications. New York: John Wiley & Sons. Mukamel, E. A., & Schnitzer, M. J. (2005). Retinal coding of visual scenes—repetitive and redundant too? Neuron, 46, 357–359. Ostijic, S., Brunel, N., & Hakim, V. (2009). How connectivity, background activity, and synaptic properties shape the cross-correlation between spike trains. Journal of Neuroscience, 29, 10234–10253. Rodieck, R. W. (1965). Quantitative analysis of cat retinal ganglion cell response to visual stimuli. Vision Research, 5, 583–601. Sakuranaga, M., & Naka, K. (1985). Signal transmission in the catfish retina, III. Transmission to type-C cell. Journal of Neurophysiology, 53, 411–428. Shanmugam, K. S., Dickey, F. M., & Green, J. A. (1979). An optimal frequency domain filter for edge detection in digital pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, 37–49. Skodras, A. N., Christopoulos, C., & Ebrahimi, T. (2001). JPEG 2000 still image compression standard. IEEE Signal Processing Magazine, 18, 36–58. Srinivasan, M. V., Laughlin, S. B., & Dubs, A. (1982). Predictive coding: a fresh view of inhibition in the retina. Proceedings of the Royal Society of London Series B, 216, 427–459. Troy, J. B. (1993). Modeling the receptive fields of mammalian retinal ganglion cells. In D. M.-K. Lam, & R. M. Shapley (Eds.), Proceedings of the retina research foundation series: vol. 3. Contrast sensitivity: from receptors to clinic (pp. 85–102). Cambridge, MA: MIT Press. Troy, J. B., & Shou, T. (2002). The receptive fields of cat retinal ganglion cells in physiological and pathological states: where we are after half a century of research. Progress in Retinal and Eye Research, 21, 263–302. Tsukamoto, Y., Smith, R. G., & Sterling, P. (1990). Collective coding of correlated cone signals in the retinal ganglion cell. Proceedings of the National Academy of Sciences of the United States of America, 87, 1860–1864. Victor, J. D. (1987). The dynamics of the cat retinal X cell centre. Journal of Physiology, 386, 219–246. von del Malsburg, C. (1981). The correlation theory of brain functions. Internal rep. 81–2. Göttingen, FRG: Max–Planck-Institute for Biophysical Chemistry. Wallace, G. K. (1991). The JPEG still picture compression standard. Communications of the ACM, 34, 30–44.