Periodic population codes: From a single circular variable to higher dimensions, multiple nested scales, and conceptual spaces

Periodic population codes: From a single circular variable to higher dimensions, multiple nested scales, and conceptual spaces

Available online at www.sciencedirect.com ScienceDirect Periodic population codes: From a single circular variable to higher dimensions, multiple nes...

2MB Sizes 43 Downloads 69 Views

Available online at www.sciencedirect.com

ScienceDirect Periodic population codes: From a single circular variable to higher dimensions, multiple nested scales, and conceptual spaces Andreas VM Herz1, Alexander Mathis2,3 and Martin Stemmler1 Across the nervous system, neurons often encode circular stimuli using tuning curves that are not sine or cosine functions, but that belong to the richer class of von Mises functions, which are periodic variants of Gaussians. For a population of neurons encoding a single circular variable with such canonical tuning curves, computing a simple population vector is the optimal read-out of the most likely stimulus. We argue that the advantages of population vector read-outs are so compelling that even the neural representation of the outside world’s flat Euclidean geometry is curled up into a torus (a circle times a circle), creating the hexagonal activity patterns of mammalian grid cells. Here, the circular scale is not set a priori, so the nervous system can use multiple scales and gain fields to overcome the ambiguity inherent in periodic representations of linear variables. We review the experimental evidence for this framework and discuss its testable predictions and generalizations to more abstract grid-like neural representations. Addresses 1 Bernstein Center for Computational Neuroscience Munich and Faculty of Biology, Ludwig-Maximilians-Universita¨t Mu¨nchen, Grosshadernerstrasse 2, 82152 Planegg-Martinsried, Germany 2 Department of Molecular and Cellular Biology and Center for Brain Science, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA 3 Werner Reichardt Centre for Integrative Neuroscience and Institute for Theoretical Physics, University of Tu¨bingen, 72076 Tu¨bingen, Germany Corresponding author: Herz, Andreas VM ([email protected])

Current Opinion in Neurobiology 2017, 46:99–108 This review comes from a themed issue on Computational neuroscience Edited by Adrienne Fairhall and Christian Machens

http://dx.doi.org/10.1016/j.conb.2017.07.005 0959-4388/ã 2017 Elsevier Ltd. All rights reserved.

movement direction relative to some external landmark. Reflecting the periodic nature of angular variables, their neural representation is periodic, too. However, periodic tuning curves may also result from the way neural responses are measured. For instance, the oriented receptive field of a neuron in visual cortex is reduced to a periodic tuning curve when the cell’s response is measured as a function of angle. More surprisingly, though, the nervous system uses periodic representations for spatial navigation [1] and conceptual categorization [2], two tasks involving variables that are not periodic in nature. The observed grid-like codes are fascinating, but what are their representational and computational merits? The ambiguity inherent to any periodic representation of non-periodic variables, such as spatial position, confounds decoding at the single-neuron level — if multiple positions are mapped onto the same value of the internal coding variable, there is simply no way to recover the one true position. The same holds for ‘mixed’ neural representations of cognitive task variables [3,4]. Here, several stimulus attributes or even multimodal inputs drive the same measure of the neuronal response, such as the firing rate. Mixed selectivity stands in contrast to the concept of multiplexing, in which a neuron might represent the intensity of a visual stimulus in the firing rate, and another stimulus attribute, such as the stimulus orientation, in the latency of the response. At the population level, representations of neurons with mixed selectivity can be read out linearly and efficiently: although every neuron carries ambiguous information, different neurons encode different stimulus combinations so that each individual stimulus triggers a unique population-level response [3–5]. Similarly, the non-uniqueness of grid codes at the single-grid scale might be resolved by pooling information across multiple scales. As we will argue in this review, simple readouts of grid codes based on canonical tuning curves are indeed possible. Key ingredients are two mechanisms long known from motor and sensory neuroscience: population-vector decoding [6] and gain modulation [7].

Introduction

Decoding circular variables in one dimension

Angular variables are of key importance for sensory and motor systems — they describe the rotation of a rigid body part around a joint, the orientation of a visual stimulus projected onto the retina, or one’s own

Because of noise intrinsic to the nervous system, neurons never respond the same way twice [8–10]. Accordingly, the population response n = (n1, n2, ..., nN) of an ensemble of N neurons is statistical in nature. It occurs with a

www.sciencedirect.com

Current Opinion in Neurobiology 2017, 46:99–108

100 Computational neuroscience

likelihood P(njx) — the conditional probability of the response n given the input x, which might describe motor actions or sensory signals. Optimal neural inference consists of computing the most likely input from the noisy response, i.e., the ‘maximum likelihood estimate’ xML. As P(njx) depends on the neural firing statistics and tuning curves, determining xML requires elaborate calculation. The challenge for down-stream neurons (and external observers) is even more demanding: to infer the input x solely from the noisy population response n. Here, the observer or down-stream neural system need to maximize the posterior probability P(xjn) over all possible x-values for given n so as to obtain the ‘maximum a posteriori estimate’ xMAP. It is a question of debate how neural systems solve this fundamental challenge [11–14]. For continuous circular variables, on the other hand, there exists an intuitive stimulus estimate, the population vector (PV), which weighs the response of each neuron by its preferred stimulus direction x (Box 1 and Figure 2e). First proposed as a coding mechanism for motor cortex, the PV is linear, robust and computable by a linear network [11,15–17]. Yet the PV remains controversial, even for motor cortex, as it is ill-suited to describe the time-varying kinematics of motor actions [18] and is quite sensitive to the uniformity of the distribution of preferred directions [19] and to the nature of noise correlations in the population [20]. The consensus view holds that PV decoding might be ‘good enough’, but rarely perfect [12,13,21,22]. In fact, for a population with cosine or various other unimodal tuning curves and Poisson spike statistics, the PV is strictly suboptimal. Surprisingly, though, there are canonical tuning curves and conditions under which the PV is the optimal decoder for Poisson statistics [23]. These canonical tuning curves are von Mises functions, which are exponentials of a sinusoid (see Box 1). Compared to sine or cosine functions, von Mises functions have an additional parameter that controls the tuning width. Figure 1 shows typical von Mises fits for orientation tuning in V1 [24], head-direction tuning in the anterior thalamic nucleus [25,26], reaching direction in motor cortex [27], and one-dimensional slices through 2D spatial firing fields of grid cells in medial entorhinal cortex [28]. These data and quantitative analyses, e.g., [27], suggest that, apart from their theoretical appeal, von Mises functions readily capture the essence of circular tuning in many cases. In addition, von Mises functions are highly appealing from a theoretical point of view: For statistically independent Poisson neurons with von Mises tuning (see Box 1 for detailed mathematical definitions), one finds [23] that (1) the most likely stimulus can be directly read out from the PV, xML = xPV, Current Opinion in Neurobiology 2017, 46:99–108

(2) the uncertainty in xML is given by the inverse PV length, (3) the likelihood P(njx) is von Mises, too, and uniquely fixed by PV length and direction, (4) the expected PV length equals the average Fisher information, up to a fixed constant. If the prior stimulus distribution P(x) is flat, then Bayes’ rule states that the posterior probability P(xjn)  P(njx). In this case, the prediction for the maximum a posteriori probability stimulus xMAP is identical to the maximum likelihood stimulus xML so that the first two results apply to xMAP, too, and similarly the third result is also true for P(xjn). Von Mises functions are thus not only advantageous when it comes to fitting experimentally measured circular tuning characteristics but also to improve the interpretation of these data within a sound theoretical framework. Just as each neuron’s response is a random variable, so is the PV. As such, the PV fluctuates from trial to trial, so that the uncertainty in xML varies, too. A reliable population response, though, could simplify downstream processing. The coefficient of variation in the Fisher information (PV length) is smallest when the tuning curve’s concentration parameter k (see Box 1) is around 2.5 (Figure 1e). The minimum is broad, so that values 2 < k < 5 are close to optimal (Figure 1a,c,d); these values for k correspond to an orientation tuning width of 30 to 50 . In contrast, maximizing the information predicts narrower tuning widths [29,30].

Decoding linear variables in one dimension by coiling up stimulus space Many stimulus variables, such as the pitch of a pure sound, the wave-length of a light source or the position of an animal, are not circular but linear variables. Do such stimuli require an entirely different neural representation or could periodic population codes still be used? Figure 2 demonstrates that this is indeed possible. However, there is a price to be paid: the many-to-one mapping (Figure 2a) from a straight line to a circle does not have a unique inverse mapping. The same is true in higher dimensions (Figure 2b). To overcome this fundamental problem, multiple periodic representations with different spatial scales l need to be combined (Figure 2c). For PV-decoding, multiple neurons are required at each scale, which thereby predicts the existence of distinct modules. Within one module, neuronal tuning curves must share the same spatial period but different curves will be phase shifted relative to each other. If no common factor exists that would divide the different modular scales evenly, the coding range is potentially very large [31]. Coiling up stimulus space for the sake of a periodic neural representation may seem costly in terms of neural hardware. Information theoretical analyses show, however, www.sciencedirect.com

Population vector readout of nested grid codes Herz, Mathis and Stemmler 101

Box 1 Population vector decoding and von Mises tuning Relation between Bayesian and population-vector based stimulus estimates How would one decode a stimulus x from the population response n = (n1, n2, ..., nj, ..., nN) of N noisy neurons? If these neurons encode a circular variable x 2 ½0; 2pÞ with tuning curves Vj(x) that each peak in one preferred direction cj, a heuristic solution [15] is given by the direction xPV of the XN population vector (PV). The PV is constructed as PV ¼ n V where the Vj are unit vectors in cj-direction. Mathematically, every neuron j is j¼1 j j assigned a phasor exp(icj) = cos(cj) + i sin(cj) which is weighted by the response nj of that neuron; summing over all neurons results in the PV and xPV is its angle in the complex plane, ! N X nj expðicj Þ : ð1Þ xPV ¼ arg j¼1

The direction of the PV therefore offers a natural readout and the average length of the PV is related to this estimate’s uncertainty (see also the main text). According to Bayes rule, P(xjn), the a-posteriori probability of the stimulus x for a given response n, and the likelihood P(njx), i.e., the conditional probability of n for given x, are related by P(xjn)P(n) = P(njx)P(x). For flat priors (P(x) = const) the stimulus xMAP maximizing the a-posteriori probability P(xjn) also maximizes the likelihood P(njx), xMAP = xML. But what is the relation between xML and xPV? In general, xML and xPV differ [12], but under four key assumptions, they are identical [23]. First, assume that the neurons fire independently, so that YN PðnjxÞ ¼ Pðnj jxÞ, and, second, that they obey Poisson statistics, P(njjx) = (nj !)1  (Vj(x))nj  eVj(x). We treat the population response n as a j¼1 vector of spike counts (summed over some fixed time window T, for example the period of one LFP oscillation). Even for relatively small N, the sum XN V ðxÞ is approximately constant, so that j¼1 j PðnjxÞ 

N Y

exp½nj lnðVj ðxÞÞ:

ð2Þ

j¼1

Third, assume that (2) holds exactly and, finally, that the tuning curves are von Mises functions (Figure 1), Vj ðxÞ ¼ a þ b expðk½cosðx  cj Þ  1Þ;

ð3Þ

with zero background activity a; in this case, b equals the maximum spike count nmax in a single time bin T. Unlike cosine tuning Vj(x) = A + B cos(x  cj), which has a fixed tuning width, the ‘concentration’ parameter k of the von Mises function allows one to vary the tuning width. For small x  cj, the von Mises function mimics Gaussian tuning, so it is also called a ‘periodic Gaussian’ or ‘circular normal’ with variance s 2  1/k for k 1. Inserting (3) in (2) for a = 0 leads to a likelihood P(njx) that is von Mises, too, albeit with a different concentration ^k, whose expected value is EðkÞ ¼ nmax Nk expðkÞ I1 ðkÞ where I1 denotes the modified Bessel function of the first kind. The likelihood’s maximum is given by Eq. (1), hence, xML = xPV. For flat P(x), xML and xMAP coincide so that all three stimulus estimates are the same. Properties of the population vector for von Mises tuning curves For von Mises tuning, the variance of the PV estimate of x is inversely proportional to ^k. As ^ k scales with nmaxN, sufficiently large population size and/or activity level makes xPV highly reliable. The Fisher information Jj(x) measures how well one can discriminate small changes in x, based on the responses of cell j. For Poisson statistics, Z 2p Jj(x) = [dVj(x)/dx]2/Vj(x). Calculated for von Mises functions (3) with a = 0, the average Fisher information, J ¼ Jj ðxÞdx, satisfies J ¼ ð2pÞ2 E ð^kÞ 0

[23]. Thus the expected PV length provides a straightforward, linear measure for positional discriminability. Of all tuning curves for which ln(Vj(x)) belongs to the class ‘2 of square-integrable functions, the von Mises functions are the only ones for which the population code yields xML = xPV and for which the Fisher information is given by the expected length of the PV. Rotations of the PV by some angle a correspond in the complex plane to a shift of xPV by a. This can be written as ! ! N N X X xPV ! xPV þ a ¼ arg nj exp½iðcj þ aÞ ¼ arg nj expðicj Þ expðiaÞ ; j¼1

ð4Þ

j¼1

in which the term exp(ia) plays the role of a multiplicative gain field [7], as discussed in the main text and Figure 2d. Finally, an animal’s position x in physical space is not a circular variable. But if the neural representation tiles space, each tile effectively coils up space onto a set of circles. This results in the repeating firing fields of grid cells, which are well fit by von Mises functions (Figure 1d), so that in 1D Vj ðxÞ ¼ nmax expðk½cosð2pðx  cj Þ=lj Þ  1Þ;

ð5Þ

where lj denotes the cell’s grid-field spacing. Beyond 1D, the argument of the exponential function in (5) is replaced by a superposition of cosines, with wave vectors that span the higher-dimensional space.

www.sciencedirect.com

Current Opinion in Neurobiology 2017, 46:99–108

102 Computational neuroscience

Figure 1

(b)

20 Motor Cortex к=2.5 15

Response (spikes/sec)

Response (spikes/sec)

(a)

10 5 0 -180°

90° 0° -90° Movement Direction

40 30 20 10 0

270° 90° 180° Heading Direction



360°

(d)

20 Visual Cortex к=3.9 15

Response (spikes/sec)

Response (spikes/sec)

Thalamus 50 к=12.9

180°

(c)

10 5 0 0°

30°

Entorhinal Cortex 15 к=2.1 10 5 0 –λ/2

60° 90° 120° 150° 180° Stimulus Orientation

(e) Coefficient of Variation of к^

60

λ/2

0 Position

(f) к= 1

0.14

к= 2.1 0.13

5

к=8 10

0.12

5

5

10

15

5

0.11 1

2

4 к

8 Current Opinion in Neurobiology

Canonical tuning curves. (a) A cell from macaque primary motor area (M1) (adapted from [27]). Cosine fit in green, von Mises fit (see Box 1) in dark blue. (b) Rat head-direction cell from the antero-dorsal thalamic nucleus (adapted from [25]). (c) Cat visual cortex neuron (adapted from [24]). (d) Grid cell from rat medial entorhinal cortex (adapted from [28]). Slice through the average firing field, normalized to the average distance l between neighboring firing fields. Inset shows the spatial firing rate map of this cell. (e) The posterior probability P(xjn) will also be von Mises with a parameter ^ k that corresponds to the length of the population vector (PV). ^k is a random variable that depends on the tuning curve’s concentration parameter. The trial-to-trial variability is minimal when the underlying tuning curves have k  2.5. Shown is the coefficient of variation of ^k for nmax N = 512. (f) Polar representation of the von Mises function for different k, together with the data from (d). The von Mises functions have been scaled to match the amplitude in (d).

that periodic multi-scale codes with constant scale ratio s = lm/lm+1 between successive modules vastly outperform codes generated by selective unimodal neurons, such as place cells: the resolution of such self-similar ‘nested codes’ improves exponentially with the total number of neurons N, whereas the resolution of place codes improves only linearly with N [32]; for a complementary argument based on coding range, see [33]. In addition, for von Mises tuning, PV-decoding can be extended to the multi-scale situation through recursive refinement (Figure 3b). Finally, when considering two consecutive modules (Figure 3c), PV-decoding predicts an optimal scale ratio s = 3/2 [34]. Periodic multi-scale codes are not just a mathematical curiosity. They are the defining feature of grid-cell activity in the medial entorhinal cortex (MEC) and other brain Current Opinion in Neurobiology 2017, 46:99–108

regions [1]. As predicted by information theory [35], the scale ratio s between successive modules is constant (Figure 3d), with a value between 1.4 [28] and 1.7 [36], in agreement with the predictions of multi-scale PVdecoding [34]. Similar optimal scales have been found by alternative methods [37]. Nested grid codes are also robust to many types of parameter variations [35,38,39]; the influence of non-Poisson spike statistics and field-tofield firing-rate variability of single neurons [40,41,42] remain to be investigated. In computer science, the advantage of related ternary codes has long been recognized (see [43] for a historical account); recent chip designs are starting to efficiently implement such designs [44]. When searching for food pellets, rodents evenly cover open environments [1,28,36,45,46] so that the prior P(x) www.sciencedirect.com

Population vector readout of nested grid codes Herz, Mathis and Stemmler 103

Figure 2

(a)

1D position 0

map space to circle

π/2

3π/2 λ0

(c)

2D position

λ

0

map space to torus

(b)

π

(d)

0



goal position

λ1 λ2

distance to goal (egocentric)

shift

λ3

0´ 0

animal position (allocentric)

(f)

gaze direction

(e)

3 nj

spike count vector n Das as Bildelement mit der as Beziehungs-ID Bez eziehungs-IDmap rId2 wurde rde de in de in space der gefunden.. de er Datei Da ate tei e nicht gefunden

to circle

0

λ

retinal position



goal phase



positional phase

0 Current Opinion in Neurobiology

From linear to circular variables and back. (a) Mapping a 1D-stimulus space to a circle or (b) a 2D-space to a torus is many-to-one. (c) A unique inverse mapping can be obtained by combining different spatial scales. Two incommensurate scales suffice for deterministic mappings but neurons are noisy, making multi-scale representations highly beneficial (see also Figure 3). (d) To calculate the relative position of a goal, the animal has to subtract the goal position from its own position. This transformation from allocentric to egocentric coordinates can be achieved by shifting the origin of the coordinate system (0) to the goal position (00 ). In circular coordinates, this operation corresponds to a rotation. (e) Broad tuning and stochastic firing are typical for grid-cell activity. In 1D, mapping space to a circle as in (a), and arranging the spike counts along this circle yields the neural population vector (PV) which represents the most likely animal position at this scale (see Box 1). See main text for D > 1. (f) Multiplicative gain fields. A gain field G(y) multiplies the tuning curve V(x) to yield the average neuronal response. Upper left: in parietal cortex, gaze direction modulates the tuning to retinal position [7], here sketched for a one-dimensional tuning curve for retinal position. Such gain fields are often linear functions of gaze direction. Bottom right: for grid codes, the gain field should be interpreted as a modulation of the synaptic readout matrix, which sinusoidally sums up the response of grid cells with different spatial phases. A change in the origin of the coordinate system corresponds to a rotation of the population vector (see Box 1). The rotation computation relies on the trigonometric sum rules cos(x + y) = cos(x) cos(y)  sin(x) sin(y) and sin(x + y) = sin(x) cos(y) + sin(y) cos(x), one of which is illustrated. Such nonlinear gain fields would allow the read-out neurons of a grid code to set an arbitrary goal location.

is nearly flat except close to the boundaries of the available space. The prior’s approximate constancy allows animals to use PV-decoding for reliable maximum a posteriori position estimates in the environment’s interior, perhaps complemented by whisker information near the boundaries where P(x) varies more strongly so that PVbased estimates are less reliable. www.sciencedirect.com

Population vectors for linear variables in higher dimensions Grid cells and neurons with mixed selectivity have a stimulus space that is more than one-dimensional. Even if the stimulus dimensions are not orthogonal, the neuronal representation could orthogonalize the stimuli. In other cases, a non-orthogonal representation can be Current Opinion in Neurobiology 2017, 46:99–108

104 Computational neuroscience

Figure 3

C20

Module 0

n P(x|n) 0 2 C 2

λ0

Ci

C1

C20

Module 1

n 0 2

P(x|n) Module 2

n 0 2

P(x|n)

(c)

single module

(e) 140

two modules 3 s= – 2

grid spacing (cm)

Ci

log[P(x|n)]

C2

log[P(x|n)]

(b)

log[P(x|n)]

(a)

M0 M1 M2 M3

120 100 80 60 40

s=2 –100 –50

0

50

100

grid orientation (deg)

0

Module 3 (d)

n

(f) 1.401

P(x|n)

λ2

λ3

λ1

P(x|n)

Module 0-3

λ0

1.8

Scale ratio s

0

Rat 15708 n = 116

1.434 1.429

1.6 1.4 1.2

λ3

50cm 1D position

1.0

M1/M0 M2/M1 M3/M2 Current Opinion in Neurobiology

Decoding grid cells. (a) Spikes from four example cells overlaid on the rat trajectory (grey). The lattice scales lk form a discrete set, ranging from coarse to fine. (b) Tuning curves of four nested modules, each with 20 cells and evenly spaced phases. The animal’s position yields a spike count vector n in each module. The posterior P(xjn) at that scale describes the probability of being at location x, given the respective spike count vector (one example shown in Figure 2e). Modules with smaller l have more localized P(xjn), but their multiple peaks cause ambiguous position estimates. The joint posterior given the responses of all modules, shown in grey, is highly localized and nonperiodic. (c) The posterior of a single module peaks at some location, here xMAP = 0. Adding a second module improves the estimate’s reliability (central red shadings). For s < 3/2, incorrect estimates are less likely (blue shading), for s > 3/2, however, the probability of catastrophic errors x = xMAP l0/2 increases (red side bands). For s  3/2 silencing intermediate modules has the same effect, whereas spatial precision only decreases marginally if the smallest-scale module is taken out of the circuit. (d) Overall, experimentally observed scale ratios are consistent with the theoretical predictions. (e) Grid axes are aligned within and across modules (circles: individual cells). (f) Wedge-like firing-rate ramps are expected from PV-decoding for read-out neurons. Panels a,d,e: adapted from [28]; panels b,c,f: adapted from [34].

decomposed into intercalated rectangular lattices. This holds for many Fisher-optimal packing codes in higher dimensions [47], such as hexagonal (2D) and hexagonal close packing (3D). Such decompositions reduce the decoding problem to computing a set of one-dimensional PVs that average out the transverse dimensions. In higher dimensions, additional constraints need to be satisfied for the vector sum of one-dimensional PVs to be the optimal xML even for von Mises tuning curves [34]. For orthogonal lattices or variables on a (hyper-)sphere, the tuning curves need to be product of von Mises functions in each dimension. In contrast, the lattice vectors for a hexagonal grid are not orthogonal, and, hence, PV decoding of populations with 2D von Mises tuning is not, in general, optimal. Wei et al., however, show that a Gaussian approximation yields a satisfactory probabilistic decoder [37]. The disadvantage of nonorthogonality is balanced by the error correction afforded by three read-out directions for the hexagonal lattice compared to the two for the square lattice [34]. Current Opinion in Neurobiology 2017, 46:99–108

Neuronal populations with other unimodal tuning curves can be the basis for PVs, too; if the tuning curves are suboptimal in 1D, then the decoding error is potentially even worse in higher dimensions. In 1D, PV-decoding requires that grid periods within one module match (Figure 2e). By the same token, grid-field axes in higher dimensions have to be aligned within a module; alignment across modules is not required but improves the resolution [34]. Experimental data (Figure 3e) are in agreement with this theoretical prediction. The grid pattern is not completely regular in tapered environments, such as trapezoids [46], calling into question the grid code’s metric nature. The true metric, however, arises only at the population level and persists as long as the local firing patterns of all neurons within a module are distorted equally [34], as observed experimentally [48]. At the mechanistic level, continuous attractor networks offer an alternative explanation for intra-module www.sciencedirect.com

Population vector readout of nested grid codes Herz, Mathis and Stemmler 105

alignment as the network’s collective activity patterns dictate identical grid-field axes for all neurons within a module [49–51], as reviewed in [52]; for a complementary grid formation model based on adaptation see [53]. However, continuous attractor network models do not explain the observed inter-module alignment or the measured grid-scale ratio across modules.

approach, Mosheiff et al. [66] show that the number of grid cells per module should decrease with grid scale, in qualitative agreement with experimental data [28]. Intriguingly, the firing rate of cells in anterior thalamus anticipates the future movement direction, so that the PV, despite being a time average, reflects the present state, and not the recent past [67].

A number of decoding theories rely on single neurons to signal the encoded variable in a winner-take-all representation, instead of a PV [31,37,54]. Bush et al., for instance, propose several network architectures for multi-scale decoding, including spike-timing based mechanisms [54]. Yet the information loss inherent to winner-take-all schemes renders them less reliable than PVs [34].

In frontal cortices, mixed representations are prevalent at the single-cell level; these can be read out efficiently at the population level by linear classifiers [3]. Likewise, object recognition both in vision and olfaction can be achieved by linear readouts at the appropriate intermediate representation [68,69]. Using simple linear–nonlinear cascade models to describe single-neuron responses, Hardcastle et al. [45] show that MEC neurons carry heterogeneous combinations of spatial, directional, and speed information. This finding underscores that in MEC, too, unimodal responses are the exception whereas mixed neural representations are the rule.

Multi-scale grid codes provide a highly efficient means to represent one’s spatial location in world-centered coordinates. For goal-directed navigation, however, an animal needs to know the direction and distance to some goal. This task can be solved by shifting the origin of the coordinate system to the goal location (Figure 2d). Projected onto the periodic representation in grid cells, such a linear transformation corresponds to a PV rotation. For von Mises tuning, this is readily accomplished through multiplicative gain fields (Box 1, [34]), illustrated in Figure 2f. Gain modulation is ubiquitous in the brain [7,55,56] and has been suggested as a mechanism for angular coordinate transformations and ‘mental rotations’ [16]. The resulting prediction for a read-out neuron of grid cells is that such a neuron’s firing activity reflects the distance to the goal (Figure 3f) [34]. Such tuning has been recently observed in entorhinal ‘object-vector cells’ of mice approaching a spatial landmark [57], and, combined with information about the animal’s movement, in bat hippocampus [58].

Outlook Nested, self-similar representations enjoy unexpected computational advantages. Given von Mises tuning, maximum likelihood estimates can be readily obtained by population-vector decoding and gain modulation can be used for linear coordinate transformations. Grid-like codes, which can be learned through self-organization [59], are also highly efficient from an information theoretic viewpoint, irrespective of the dimension of the stimulus space [32,47] and the existence of noise correlations [23]. The PV coarse-grains neural activities over some fixed time window, ignoring the role of spike timing in both sensory processing [60,61] and motor control [62]. This time window might be set by the time scale of network oscillations [63]. For dynamically changing inputs, Burak and Fiete derive an ideal time window from an information-diffusion inequality ([64], see also [65]). With this www.sciencedirect.com

Results from Dunn et al. [40], Diehl et al. [41], and Ismakov et al. [42] point to yet another variant of this overarching neural coding principle: all three groups present experimental evidence that firing rates differ between the different fields of the same grid cell. This field-to-field variability is preserved in time and instantiates a distributed firing-rate code that can be regarded as combinatorial extension of hippocampal place-cell coding. At the same time, PV decoding across multiple grid scales can be applied: As PVs automatically sum over individual firing-rate variations, they remain ideally suited for decoding from heterogeneous firing fields, as long as the firing-rate variations average out within each module. Recent findings suggest that grid codes are not only crucial for spatial localization and goal-directed navigation but play a key role for other cognitive processes, too. Aronov et al. [70] reported that grid cells can represent non-spatial, continuous variables by multiple firing fields. Rats were trained to deflect a joystick during a sequence of tones with increasing frequency until a particular sound frequency was heard. Remarkably, some entorhinal neurons that were identified as grid cells in a navigation paradigm also responded with discrete firing fields to specific sound frequencies. In free-viewing visual memory tasks certain neurons in primate entorhinal cortex displayed grid-like firing fields as a function of gaze fixation, which could help to encode spatial relations during visual exploration even without locomotion [71]. Extending the primate studies to human subjects, Julian et al. [72] tracked the subjects’ gaze during visual search. Previous work by Doeller et al. had revealed that fMRI signals from the MEC of human subjects navigating in virtual environments Current Opinion in Neurobiology 2017, 46:99–108

106 Computational neuroscience

exhibit a 60 -modulation when subjects changed their movement direction [73]. This phenomenon has been interpreted as a signature of ‘conjunctive’ grid cells, which encode both location and heading direction. Julian et al. observed that as a function of the gaze’s movement direction, modulations of the fMRI signal from EC matched the 6-fold rotational symmetry of grid cell firing. This suggests that same mechanism underlying the cognitive map of navigational space may also create a map of visual space. At a more abstract level, Constantinescu et al. [2] showed that grid codes play a role in higher dimensional cognitive tasks that go far beyond sensory processing and spatial navigation. Rather than having subjects navigate in virtual, acoustic or visual space, Constantinescu et al. presented visual stimuli, whose features varied along two dimensions (the neck and leg length of cartoon birds), which subjects had to associate with unrelated holiday symbols. In subjects performing this cognitive task, a 6fold modulation with respect to these feature dimensions was observed in the fMRI signal [2,74]. Many conceptual spaces can be parameterized (for an example involving faces see [75]), and human beings are able to reason within such spaces [76]. Future fMRI experiments will help to evaluate how pervasive nested grid codes are for human cognition. In parallel, methods with cellular resolution, as employed by Aronov et al. [70], will help us to decipher the underlying neural computations.

Conflict of interest statement Nothing declared.

Acknowledgements This project was funded through the Bernstein Center for Computational Neuroscience Munich (A.V.M.H and M.S., grant BMBF 01GQ1004A) and a Marie Curie International Fellowship within the 7th European Community Framework Program under grant agreement No. 622943 MA 6176/1-1 (A. M.).

References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as:  of special interest  of outstanding interest 1.

Hafting T, Fyhn M, Molden S, Moser MB, Moser EI: Microstructure of a spatial map in the entorhinal cortex. Nature 2005, 436:801-806.

2. 

Constantinescu AO, O’Reilly JX, Behrens TE: Organizing conceptual knowledge in humans with a gridlike code. Science 2016, 352:1464-1468. The authors report a 6-fold modulation of fMRI activity within entorhinal and prefrontal cortices when human subjects navigated in two-dimensional conceptual spaces. This finding suggests that grid-like codes may be used to organize both spatial and non-spatial knowledge. 3.

4.

Rigotti M, Barak O, Warden MR, Wang X-J, Daw ND, Miller EK, Fusi S: The importance of mixed selectivity in complex cognitive tasks. Nature 2013, 497:585-590. Fusi S, Miller EK, Rigotti M: Why neurons mix: high dimensionality for higher cognition. Curr Opin Neurobiol 2016, 37:66-74.

Current Opinion in Neurobiology 2017, 46:99–108

5.

Mante V, Sussillo D, Shenoy KV, Newsome WT: Contextdependent computation by recurrent dynamics in prefrontal cortex. Nature 2013, 503:78-84.

6.

Georgopoulos AP, Caminiti R, Kalaska JF, Massey JT: Spatial coding of movement: a hypothesis concerning the coding of movement direction by motor cortical populations. Exp Brain Res Suppl 1983, 7:336.

7.

Andersen RA, Essick GK, Siegel RM: Encoding of spatial location by posterior parietal neurons. Science 1985, 230:456458.

8.

Faisal AA, Selen LP, Wolpert DM: Noise in the nervous system. Nat Rev Neurosci 2008, 9:292-303.

9.

Wohrer A, Humphries MD, Machens CK: Population-wide distributions of neural activity during perceptual decisionmaking. Progress Neurobiol 2013, 103:156-193.

10. Yuste R: From the neuron doctrine to neural networks. Nat Rev Neurosci 2015, 16:487-497. 11. Pouget A, Zhang K, Deneve S, Latham PE: Statistically efficient estimation using population coding. Neural Comput 1998, 10:373-401. 12. Salinas E, Abbott L: Vector reconstruction from firing rates. J Comput Neurosci 1994, 1:89-107. 13. Pouget A, Dayan P, Zemel R: Information processing with population codes. Nat Rev Neurosci 2000, 1:125-132. 14. Aitchison L, Lengyel M: The hamiltonian brain: efficient probabilistic inference with excitatory-inhibitory neural circuit dynamics. PLOS Comput Biol 2016, 12:e1005186. 15. Georgopoulos AP, Schwartz AB, Kettner RE: Neuronal population coding of movement direction. Science 1986, 233:1416-1419. 16. Georgopoulos AP, Taira M, Lukashin A: Cognitive neurophysiology of the motor cortex. Science 1993, 260:47. 17. van Hemmen JL, Schwartz AB: Population vector code: a geometric universal as actuator. Biol Cybern 2008, 98:509-518. 18. Churchland MM, Cunningham JP, Kaufman MT, Foster JD, Nuyujukian P, Ryu SI, Shenoy KV: Neural population dynamics during reaching. Nature 2012, 487:51-56. 19. Scott SH, Gribble PL, Graham KM, Cabel DW: Dissociation between hand motion and population vectors from neural activity in motor cortex. Nature 2001, 413:161-165. 20. Kohn A, Coen-Cagli R, Kanitscheider I, Pouget A: Correlations and neuronal population information. Annu Rev Neurosci 2016, 39:237-256. 21. Deneve S, Latham PE, Pouget A: Reading population codes: a neural implementation of ideal observers. Nat Neurosci 1999, 2:740-745. 22. Schwartz AB: Movement: how the brain communicates with  the world. Cell 2016, 164:1122-1135. Excellent review of current motor control and learning field with a discussion of the usefulness of population vectors for decoding movement and contemporary brain machine interfaces. 23. Mathis A, Herz AV, Stemmler MB: Multiscale codes in the nervous system: the problem of noise correlations and the ambiguity of periodic scales. Phys Rev E 2013, 88:022713. 24. Swindale NV: Orientation tuning curves: empirical description and estimation of parameters. Biol Cybern 1998, 78:45-56. 25. Taube JS: Head direction cells recorded in the anterior thalamic nuclei of freely moving rats. J Neurosci 1995, 15:70-86. 26. Peyrache A, Lacroix MM, Petersen PC, Buzsa´ki G: Internally organized mechanisms of the head direction sense. Nat Neurosci 2015, 18:569-575. 27. Amirikian B, Georgopulos AP: Directional tuning profiles of motor cortical cells. Neurosci Res 2000, 36:73-79. www.sciencedirect.com

Population vector readout of nested grid codes Herz, Mathis and Stemmler 107

28. Stensola H, Stensola T, Solstad T, Frøland K, Moser M-B, Moser EI: The entorhinal grid map is discretized. Nature 2012, 492:72-78. 29. Zhang K, Ginzburg I, McNaughton BL, Sejnowski TJ: Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. J Neurophysiol 1998, 79:1017-1044. 30. Montemurro MA, Panzeri S: Optimal tuning widths in population coding of periodic variables. Neural Comput 2006, 18:15551576. 31. Fiete IR, Burak Y, Brookings T: What grid cells convey about rat location. J Neurosci 2008, 28:6858-6871. 32. Mathis A, Herz AV, Stemmler MB: Resolution of nested neuronal representations can be exponential in the number of neurons. Phys Rev Lett 2012, 109:018103. 33. Sreenivasan S, Fiete I: Grid cells generate an analog errorcorrecting code for singularly precise neural computation. Nat Neurosci 2011, 14:1330-1337. 34. Stemmler M, Mathis A, Herz AV: Connecting multiple spatial  scales to decode the population activity of grid cells. Sci Adv 2015, 1:e1500816. The authors show PVs can compute a goal vector from grid cells organized in nested modules spanning multiple length scales. A worstcase decoding analysis yields an optimally robust scale ratio of 3/2. The theory predicts nonlinear gain fields and specific impairment of spatial navigation if intermediate-scale modules are silenced. 35. Mathis A, Herz AV, Stemmler M: Optimal population codes for space: grid cells outperform place cells. Neural Comput 2012, 24:2280-2317. 36. Barry C, Hayman R, Burgess N, Jeffery KJ: Experience-dependent rescaling of entorhinal grids. Nat Neurosci 2007, 10:682-684. 37. Wei X-X, Prentice J, Balasubramanian V: A principle of economy predicts the functional architecture of grid cells. eLife 2015, 4: e08362. 38. Towse BW, Barry C, Bush D, Burgess N: Optimal configurations of spatial scale for grid cell firing under noise and uncertainty. Phil Trans R Soc Lond B Biol Sci 2014, 369:20130290. 39. Vago L, Ujfalussy BB: Robust and efficient coding with grid cells. bioRxiv 2017:107060. 40. Dunn B, Wennberg D, Huang Z, Roudi Y: Grid cells show field-to field variability and this explains the aperiodic response of inhibitory interneurons. 2017 arXiv:1701.04893. The authors show that there are significant field-to-field differences in the firing rates of a grid cell and explore how the interactions between inhibitory and excitatory neurons in an continuous attractor networks might explain the phenomenon. 41. Diehl GW, Hon OJ, Leutgeb S, Leutgeb JK: Grid and nongrid cells  in medial entorhinal cortex represent spatial location and environmental features with complementary coding schemes. Neuron 2017, 94:83-92.e86. The authors study how entorhinal neurons represent environmental features. They observe that grid cells retain stable spatial representations but show rate differences across fields, while nongrid spatial cells alter their spatial firing much more strongly. 42. Ismakov R, Barak O, Jeffery K, Derdikman D: Grid cells encode  local positional information. Curr Biol 2017. The authors show that there are significant field-to-field differences in the firing rates of a grid cell. The firing profile is stable during rescaling of the arena but collapses under grid realignment. 43. Hayes B: Computing science: third base. Am Sci 2001, 89:490494. 44. Kim W, Chattopadhyay A, Siemon A, Linn E, Waser R, Rana V: Multistate memristive tantalum oxide devices for ternary arithmetic. Sci Rep 2016:6. 45. Hardcastle K, Maheswaranathan N, Ganguli S, Giocomo LM: A multiplexed, heterogeneous, and adaptive code for navigation  in medial entorhinal cortex. Neuron 2017, 94 375–387.e377. A systems-theoretic analysis shows that most neural responses in MEC combine spatial, directional, and speed information. This complexity is not captured by ad-hoc classifications and terms such as ‘grid’ or www.sciencedirect.com

‘border’ cell, suggesting that MEC operates with mixed neural representations. 46. Krupic J, Bauza M, Burton S, Barry C, O’Keefe J: Grid cell symmetry is shaped by environmental geometry. Nature 2015, 518:232-235. 47. Mathis A, Stemmler MB, Herz AV: Probable nature of higher dimensional symmetries underlying mammalian grid-cell activity patterns. eLife 2015, 4:e05979. Mathematical derivation of which neuronal lattice codes maximize spatial resolution. For grid cells representing planar space the theory singles out hexagonal activity patterns as optimal. In three dimensions, a facecentered cubic lattice tuning is best. This prediction could be tested by recording during spatial navigation in flying bats, arboreal monkeys, or marine mammals. More generally, the theory suggests that for efficiency higher-dimensional sensory or cognitive variables should be encoded with populations of grid-cell-like neurons whose activity patterns exhibit lattice structures at multiple, nested scales. 48. Stensola T, Stensola H, Moser M-B, Moser EI: Shearing-induced  asymmetry in entorhinal grid cells. Nature 2015, 518:207-212. Grid cell firing rate maps show systematic deviations from periodicity, such as an increasing elliptic distortion towards the boundaries of an enclosure, but these properties are shared within modules, so that PV decoding would still be possible. 49. Fuhs MC, Touretzky DS: A spin glass model of path integration in rat medial entorhinal cortex. J Neurosci 2006, 26:4266-4276. 50. Burak Y, Fiete IR: Accurate path integration in continuous attractor network models of grid cells. PLoS Comput Biol 2009, 5:e1000291. 51. Yoon K, Buice MA, Barry C, Hayman R, Burgess N, Fiete IR: Specific evidence of low-dimensional continuous attractor dynamics in grid cells. Nat Neurosci 2013, 16:1077-1084. 52. Burak Y: Spatial coding and attractor dynamics of grid cells in  the entorhinal cortex. Curr Opin Neurobiol 2014, 25:169-175. Brilliant review of collective dynamics and population coding in the gridcell system. Ideal and compact ‘warm-up’ for any novice in the field, and a thought-provoking read for experts. 53. Si B, Kropff E, Treves A: Grid alignment in entorhinal cortex. Biol Cybern 2012:1-24. 54. Bush D, Barry C, Manson D, Burgess N: Using grid cells for  navigation. Neuron 2015, 87:507-520. Inspired by the Fourier shift theorem, the authors present an algorithm for how grid cells can be used for vector navigation between arbitrary locations. They present various network models that implement this algorithm. 55. Herzfeld DJ, Kojima Y, Soetedjo R, Shadmehr R: Encoding of  action by the Purkinje cells of the cerebellum. Nature 2015, 526:439-442. When simple-spike responses of Purkinje cells are grouped according to the cells’ complex-spike tuning, their sum accurately predicts the direction and speed profile of saccades. The response is given by a gain field of direction and speed. 56. Brayanov JB, Press DZ, Smith MA: Motor memory is encoded as a gain-field combination of intrinsic and extrinsic action representations. J Neurosci 2012, 32:14951-14965. 57. Hoydal OA, Skytoen ER, Moser M-B, Moser EI: Object-vector cells in the medial entorhinal cortex. Society for Neuroscience Meeting 2017. 2017. 58. Sarel A, Finkelstein A, Las L, Ulanovsky N: Vectorial  representation of spatial goals in the hippocampus of bats. Science 2017, 355:176-180. Successful navigation requires a neural representation of one’s own position and goal location. Recording from the hippocampal CA1 region of flying bats, the authors discovered neurons tuned to goal distance and/ or goal direction, suggesting a hippocampus-based vectorial representation of spatial goals. 59. Stella F, Treves A: The self-organization of grid cells in 3D. eLife  2015, 4:e05913. Self-organizing grid-cell networks provide a conceptual alternative to continuous attractor models. Addressing the nature of grid fields of animals freely navigating in 3D, the authors show that face-centeredcubic (FCC) and hexagonal-close-packed (HCP) arrangements are optimal solutions. Numerical simulations reveal an initially rapid convergence Current Opinion in Neurobiology 2017, 46:99–108

108 Computational neuroscience

towards these solutions, but no sign of convergence towards either a pure FCC or HCP ordering. 60. Van Rullen R, Thorpe SJ: Rate coding versus temporal order coding: what the retinal ganglion cells tell the visual cortex. Neural Comput 2001, 13:1255-1283. 61. Elhilali M, Fritz JB, Klein DJ, Simon JZ, Shamma SA: Dynamics of precise spike timing in primary auditory cortex. J Neurosci 2004, 24:1159-1172. 62. Srivastava KH, Holmes CM, Vellema M, Pack AR, Elemans CPH, Nemenman I, Sober SJ: Motor control by precisely timed spike patterns. Proc Natl Acad Sci U S A 2017, 114:1171-1176. 63. Panzeri S, Macke JH, Gross J, Kayser C: Neural population coding: combining insights from microscopic and mass signals. Trends Cogn Sci 2015, 19:162-172. 64. Burak Y, Fiete IR: Fundamental limits on persistent activity in networks of noisy neurons. Proc Natl Acad Sci U S A 2012, 109:17645-17650. 65. Levakova M, Tamborrino M, Kostal L, Lansky P: Accuracy of rate coding: when shorter time window and higher spontaneous activity help. Phys Rev E 2017, 95:022310. 66. Mosheiff N, Agmon H, Moriel A, Burak Y: An efficient coding  theory for a dynamic trajectory predicts non-uniform allocation of entorhinal grid cells to modules. PLOS Comput Biol 2017, 13:e1005597. First theoretical grid-cell study to take (rodent) movement statistics into account. Unlike previous snap-shot approaches to spatial coding [36,38], this framework predicts that the number of grid cells per module decreases with grid scale, in qualitative agreement with experimental data. 67. Blair HT, Lipscomb BW, Sharp PE: Anticipatory time intervals of head-direction cells in the anterior thalamus of the rat: implications for path integration in the head-direction circuit. J Neurophysiol 1997, 78:145-159.

Current Opinion in Neurobiology 2017, 46:99–108

68. Majaj NJ, Hong H, Solomon EA, DiCarlo JJ: Simple learned weighted sums of inferior temporal neuronal firing rates  accurately predict human core object recognition performance. J Neurosci 2015, 35:13402-13418. Weighted sums of the population activity in monkey inferotemporal cortex match the performance of human subjects on a challenging objectrecognition task comprising thousands of images. 69. Mathis A, Rokni D, Kapoor V, Bethge M, Murthy VN: Reading out olfactory receptors: feedforward circuits detect odors in mixtures without demixing. Neuron 2016, 91:1110-1123. 70. Aronov D, Nevers R, Tank DW: Mapping of a non-spatial  dimension by the hippocampal–entorhinal circuit. Nature 2017, 543:719-722. Rats were trained to deflect a joystick to increase the frequency of a presented sound and to release the joystick when a particular sound frequency was reached. During this task, hippocampal and entorhinal neurons were recorded. Cells that were identified as grid cells in a navigation paradigm also exhibited discrete firing fields to particular sound frequencies. 71. Killian NJ, Jutras MJ, Buffalo EA: A map of visual space in the primate entorhinal cortex. Nature 2012, 491:761-764. 72. Julian JB, Keinath AT, Frazzetta G, Epstein RA: Evidence for a grid-like representation of visual space in humans. Vision Sciences Society Meeting. 2017. 73. Doeller CF, Barry C, Burgess N: Evidence for grid cells in a human memory network. Nature 2010, 463:657-661. 74. Kriegeskorte N, Storrs KR: Grid cells for conceptual spaces? Neuron 2016, 92:280-284. 75. Freiwald WA, Tsao DY, Livingstone MS: A face feature space in the macaque temporal lobe. Nat Neurosci 2009, 12:1187-1196. 76. Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND: How to grow a mind: statistics, structure, and abstraction. Science 2011, 331:1279-1285.

www.sciencedirect.com