238
Neural population codes Terence D Sanger In many regions of the brain, information is represented by patterns of activity occurring over populations of neurons. Understanding the encoding of information in neural population activity is important both for grasping the fundamental computations underlying brain function, and for interpreting signals that may be useful for the control of prosthetic devices. We concentrate on the representation of information in neurons with Poisson spike statistics, in which information is contained in the average spike firing rate. We analyze the properties of population codes in terms of the tuning functions that describe individual neuron behavior. The discussion centers on three computational questions: first, what information is encoded in a population; second, how does the brain compute using populations; and third, when is a population optimal? To answer these questions, we discuss several methods for decoding population activity in an experimental setting. We also discuss how computation can be performed within the brain in networks of interconnected populations. Finally, we examine questions of optimal design of population codes that may help to explain their particular form and the set of variables that are best represented. We show that for population codes based on neurons that have a Poisson distribution of spike probabilities, the behavior and computational properties of the code can be understood in terms of the tuning properties of individual cells. Addresses Department of Neurology and Neurological Sciences, Pediatric Movement Disorders Clinic, Stanford University Medical Center, 300 Pasteur Drive, A345 Stanford, CA 94305-5235, USA e-mail:
[email protected]
Current Opinion in Neurobiology 2003, 13:238–249 This review comes from a themed issue on Cognitive neuroscience Edited by Brian Wandell and Anthony Movshon 0959-4388/03/$ – see front matter ß 2003 Elsevier Science Ltd. All rights reserved. DOI 10.1016/S0959-4388(03)00034-5
Abbreviations MAP maximum a posteriori ML maximum likelihood
Introduction In vertebrates, information is often encoded as patterns of activity within populations of neurons subserving a similar function. This may partly explain the brain’s resilience to injury, precision of action, and ability to learn. Interest in understanding these patterns of activity is spurred by two experimental goals [1]. The first goal is decoding: Current Opinion in Neurobiology 2003, 13:238–249
understanding how the ‘neural code’ is read in order to control external electrical interfaces and prosthetic devices. The second goal is computation: understanding how the brain processes information to accomplish behavior. Approaching these two goals requires different strategies. In the first case, we attempt to extract the best possible estimate of an underlying variable. In the second case, we need to know not only what information is encoded but also what mechanisms exist for computation, as a limitation in processing within the brain could mean that not all of the information in a population is actually used. Here, we discuss the analysis of population coding in several stages. In the first stage, we look at the firing patterns for the individual neurons that make up the population. Information coding in individual neurons is often described as being either a ‘rate code’ or a ‘temporal code’. In a rate code, all the information is contained in the average firing rate of the cell. In a temporal code, information may also be contained in the precise time at which a spike occurs, or in the precise interval between different spikes [2–8]. We will confine this discussion to neurons for which a rate code contains all the behaviorally relevant information [9,10,11,12]. For a neuron that uses a rate code, the probability of firing is often modeled using Poisson statistics. In this case, the behavior of each cell in the population is described by tuning curves that relate the probability of firing to external variables that depend on sensation or motor performance. Therefore, this review begins with a description of cells with Poisson spike statistics, followed by an analysis of the formation of tuning curves and the types of tuning curves that are observed. Given a neural population with a known set of tuning curves, the problem of reading the ‘neural code’ during a physiology experiment becomes one of estimation; we must attempt to estimate the true value of an external variable on the basis of recordings of spike data from the population. We first discuss linear estimation methods, which were the techniques initially applied to this problem. We then discuss optimal Bayesian methods, which are based on maximum likelihood estimators. In order to perform a calculation on values that are represented by population codes, the brain must be able to transform one or more ‘input’ population codes into a new population code that represents the result of the desired computation. Therefore, computation in the brain can be described as a mapping between different population codes, in which values in one code are translated into values in a new code with different properties. www.current-opinion.com
Neural population codes Sanger 239
As the population codes in our formulation will be described completely by the tuning curves for each individual cell, this problem is one of approximating the desired tuning curve for the cell in the output population as a function of the activities of the cells in the input population. We describe the algorithms that have been developed to accomplish this task.
The mathematical basis for understanding population codes is still in its infancy, and many of the techniques described here will no doubt be significantly modified in the future. At this time, however, we believe that it is possible to describe and analyze population codes using techniques from probability and estimation theory, as we will show below.
We often would like to be able to answer the question, ‘what is coded in a particular neural population?’ This question is difficult to answer, as there may be more than one variable that is correlated with neural activity in the population. In the final section of this review, we examine techniques for determining whether there is a particular variable that is best represented by a population. As with algorithms for reading the neural code, there are both linear methods (which are based on correlation) and probability theory methods (which are based on calculating information measures).
Single-cell representations To model the behavior of individual cells in a population, we consider an external variable x that determines the average firing of a cell. As the firing rate depends on x, we can think of this as a tuning curve s(x). The cell generates a sequence of spikes n(t), in which the average firing rate is given by s(x). At each time t, n(t) is either 1 (meaning a spike occurred) or 0, (meaning no spike occurred). Figure 1 shows the basic structure of this model, with an example of a tuning curve and simulated spike raster data. For example, the external variable x could be the
Figure 1
(a) x
(b)
Tuning curve
Spike generator
s(x)
(c)
Tuning curve
n(t)
Spike raster
2 1.5 1
Input x
0.5 0
–0.5 –1 –1.5 –2
0
0.2
0.4 0.6 0.8 Response s(x)
1
20
40
60 Time
80
100
Current Opinion in Neurobiology
Model of a single cell within a population. The external variable x determines the average firing rate s according to a tuning curve s(x). The spike generator creates a series of spikes n(t) on the basis of the average firing rate s(x) using a statistical probability of firing p(n|s). (a) The cell model. (b) Example of a tuning curve. (c) Spike rasters generated from the tuning curve in (b). Each row shows 100sec of spike data for a particular value of the input x between 2 and 2. www.current-opinion.com
Current Opinion in Neurobiology 2003, 13:238–249
240 Cognitive Neuroscience
direction of hand movement, s(x) could be the average firing rate of a cell that is tuned for movement direction and whose firing rate varies depending on the movement direction, and n(t) could be the actual sequence of spikes generated by that cell during a hand movement. In a rate code, the statistics of spike generation can be modeled using Poisson statistics. A Poisson process satisfies the ‘independent increment’ property, which states that the time interval between any pair of spikes is statistically independent of the time interval between any other pair of spikes. A Poisson process is described by the following probability of spike firing: PðnjsÞ ¼ ðsDtÞn eðsDtÞ =n!
(1)
where n is the number of spikes within time interval Dt sec, s is the average spike rate (in spikes/sec), and P(n|s) is the conditional probability of observing n spikes in an interval of length Dt if the average spike rate is s. This model can be extended to take into account refractory periods, but we confine our discussion to this simple case. The spike rate s can be time-varying, written as s(t), or it can depend upon other variables x, written as s(x) or s(t,x). Strictly speaking, the spike rate obeys Poisson statistics only if it is not time-varying. For example, if the spike rate is usually zero, but becomes very large at exactly onesecond intervals, then the spike train will have bursts of spikes every one second and this pattern will not satisfy the ‘independent interval’ property of Poisson spike trains. We usually assume that the spike generators for different cells are independent, even if the average spike rates are correlated. This means that for two different cells, P(n1|s1) is independent of P(n2|s2), but P(n1) is not necessarily independent of P(n2) if s1(t) and s2(t) are not independent. Under the strictest assumption of Poisson firing, all the information is contained in the spike rate s, or equivalently in the probability of firing P(n). As s and P(n) cannot be measured directly, the goal of signal extraction or computation is to use the spike data itself to estimate the parameters of interest.
Figure 2
Input variable
x
Average spike rates
Population spike activity
Tuning curve 1
s1(x)
Spike generator
n1(t)
Tuning curve 2
s2(x)
Spike generator
n2(t)
Tuning curve 3
s3(x)
Spike generator
n3(t)
Current Opinion in Neurobiology
Model of a population of cells representing a single external variable x. Each cell has its own tuning curve that describes its average spike rate in response to different values of x. The spike activity for each cell in the population is generated from each cell’s average firing rate.
models, x may be used to represent an entire function or probability distribution [2,19]. The function si(x) is defined as the ‘tuning curve’ of the i’th cell of the population. The tuning curve represents the response of the cell to each input or input pattern. Note that this is different from the term ‘receptive field’, which we take to mean a linear filter whose output determines the cell response to input functions. A receptive field is thus a particular (linear) model for how the rate function s(x) is generated from an input pattern x. The concept of a receptive field applies only when the input x is actually a pattern. Such a pattern can be described as a function x(w), where x(w) gives the intensity of the pattern at each position element w. For example, if x(w) is the pattern of light on the retina, then a linear cell with receptive field r(w) will have a tuning curve described by: Z sðxðwÞÞ ¼ rðwÞxðwÞdw
Population representations and tuning curves
In this case, the tuning curve s(x) tabulates the response of the average cell firing rate to each pattern x(w), whereas the receptive field r(w) is a model for the mechanism that generates the response s(x). This model is shown in Figure 3.
As a population is composed of individual cells, all of the information in a population of Poisson firing cells is determined by the set of firing rates si(x), where i indexes each of the different cells in the population. Figure 2 shows how a population might encode a single variable x. In order to model a cell or population that responds to more than one variable, we can consider the variable x to be a vector (x1,x2) encoding more than one feature of movement or the environment [13]. If the variable x changes with time, then the population activity will represent the changing value of x [14,15–18]. In some
The structure of the tuning curves si(x) may determine the ability of the population to represent different values of x. Tuning curves that respond to many different values of x are considered to be ‘broadly-tuned’, whereas receptive fields that respond to only a small range of values of x are considered to be ‘narrowly-tuned’. Tuning to a range of values is a feature of many different neural representations [20]. One of the fascinating aspects of the mathematics of population codes is the fact that very precise estimates of x can be made from the relatively imprecise
Current Opinion in Neurobiology 2003, 13:238–249
www.current-opinion.com
Neural population codes Sanger 241
Figure 3
x(w)
Receptive field r(w)
s(x) Current Opinion in Neurobiology
Model of receptive field tuning. A receptive field is a linear model that describes the dependence of the tuning curve s(x) on x, when x is a function of position w (on the retina, for example).
information contained in a population of broadly-tuned cells (e.g. see [21]). In fact, broad tuning is likely to have certain theoretical advantages, one of which is the fact that at any given time a large number of cells will be firing. The population will therefore be relatively insensitive to the loss of cells or noise in a small number of cells [22]. Cells may be simultaneously tuned to more than one variable [13,16,23–26]. For example, for two different variables this is represented by a tuning curve si(x1,x2). If x2 is held constant, then the tuning curve will appear to be a function of only x1 but this curve will change as x2 changes (see Figure 4). This is one model for interactions between multiple variables, and it has been verified by several authors [24,27,28]. A multi-dimensional tuning curve may be ‘separable’, which means that it can be
decomposed into a product of the functions of each of its variables: si(x1,x2) ¼ ui(x1)vi(x2). Then, as x2 changes, the magnitude (but not the shape) of the tuning curve will appear to be modulated by vi(x2). In some cases, it may be possible to find a change of coordinates for x1 and x2 that makes a non-separable tuning curve separable. Application of a similar principle to either coding of arm position or mechanical load has suggested that tuning curves in motor cortex may be separable if position is described by joint angles [13,29]. Population representations may be more complicated if different cells have different tuning curves [30], are tuned to different variables [31], or adapt their responses to changes in the environment [19]. Broad or narrow tuning is essentially a property of ‘place codes’, in which the firing of individual cells in the population is used to indicate a range of values of the represented variable. Other types of tuning are certainly possible. For instance, cells could increase their firing rate in response to increases in x, so that s(x) increases as x increases. This would be called a ‘value code’ (Figure 5; [32]). In this case, the multiple cells in the population essentially give redundant information about the value of x, and this redundancy can be used when extracting information to make the best possible estimate of x. Place coding and value coding have been found in combination [33], and it is probably best to think of these as two particular examples of types of tuning functions. The general issue of estimating the value of x or of performing
Figure 4
0.12 s(x1,x2)
0.1 0.08 0.06 0.04 0.02 0 20 15
x2
10 5 0
0
10
20
30
40
50
60
70
80
x1 Current Opinion in Neurobiology
Example of a tuning curve s(x1,x2) for a single cell that is a function of two variables. If x2 is held constant, the curve appears to be a broadly tuned function of x1. As x2 changes, the tuning curve for x1 seems to shift to the right. www.current-opinion.com
Current Opinion in Neurobiology 2003, 13:238–249
242 Cognitive Neuroscience
Figure 5
(b) x=4
(a) Place code s3
s4
x=3
s2
x=1
x=2
s(x)
s1
s1 s2 s3 s4
x (d) x=4
(c)
s2
s3
x=2
s1
s4
x=1
s(x)
x=3
Value code
x
s1 s2 s3 s4 Current Opinion in Neurobiology
Illustration of place coding (a and b) versus value coding (c and d). (a) and (c) show the tuning curves si(x) for four hypothetical cells. (b) and (d) show the relative responses of each of the four cells to increasing values of x. In a place code, the population activity (b) is localized to the cells that respond most to each value of x. In a value code, the population activity (d) becomes progressively greater as x increases.
computations is the same no matter what the specific structure of the receptive fields.
Interpreting neural activity To interpret neural activity, we must estimate the value of the variable x on the basis of measurements of the spike activity of a population of cells (see Figure 6). For example, we might attempt to measure the direction of arm movement on the basis of measurements of firing in primary motor cortex [34,35]. There are several experimental paradigms for estimating the value of a variable x(t) given observations of neural firing rates ni(t) and knowledge of receptive fields si(x). Most of these methods assume that empirical estimates of average neuronal firing rates s(x) are available (usually obtained by counting spikes over a fixed interval or by measuring interspike intervals). Fewer methods apply to ‘instantaneous’ spike data that are obtained over very small time intervals, for which each cell fires at most once and the population spike data is therefore a binary vector [12,36,37]. Perhaps the most widely discussed linear method for estimation from average firing rates is the Population Vector method pioneered by Georgopoulos and coCurrent Opinion in Neurobiology 2003, 13:238–249
workers, which has been applied to the estimation of the direction of hand movement during reaching. In this case, x is a column vector of the three components of the direction of movement, and the authors choose an approximate model for si(x) in which the average firing rate is a linear function of x: si ðxÞ ¼ ri x ¼ rix xx þ riy xy þ riz xz in which ri is the row vector of the ‘preferred direction’ for cell i (i.e. the value of x for which si(x) is maximized). The value of si(x) is estimated by counting the number of spikes, ni(t), for each cell within short intervals. Under these assumptions the linear model can be written n=Dt s ¼ Rx, where R is an Nx3 matrix whose rows are given by ri for i from 1 to N, n is a column vector of observed spike counts for all the cells in the population, and s is a column vector of the actual measured spike rates. The best linear estimate of x given the spike counts (under the assumption of no additional noise in the system) is then [38]: xlin ¼ Rþ n=Dt where Rþ is the pseudo-inverse of R, given by Rþ ¼ ðR0 RÞ1R0 , and R0 is the transpose of R [36]. Georgopoulos and co-workers [39–42] approximate this equation by using the ‘population vector’ method. For cells www.current-opinion.com
Neural population codes Sanger 243
Figure 6
x
Tuning curve 1
Spike generator
n1(t)
Tuning curve 2
Spike generator
n2(t)
Tuning curve 3
Spike generator
n3(t)
for the observed pattern of population activity. When the prior distributions are not known, the maximum likelihood (ML) estimate is often used instead [51–53]. The ML estimate is the value of x that maximizes P({ni}|x). ?
xest
Current Opinion in Neurobiology
Interpreting neural activity. The goal of interpretation is to find an algorithm (represented by the ‘?’) that calculates an estimate xest of the true value of x, on the basis of spike firing of cells in the population.
with cosine-shaped receptive fields and an even distribution of preferred directions, computing the population vector is equivalent to using the transpose instead of the pseudo-inverse: xpv R0 n=Dt; which will be an unbiased estimator of x so long as R0 R is approximately the identity. If this is not the case (and in some experiments it clearly is not [13,43]), then using the best linear estimator xest ¼ Rþ n/Dt will produce a more accurate result [1,44,45]. Self-organizing artificial neural network methods also exist that can be used to estimate the result [46]. Nevertheless, in many cases, the population vector approximation will produce good estimates of the direction of arm movement from activity of cells in the motor cortex [39,40], and this approximation can also be used in other neural systems [9,21,47,48]. Although linear estimates are computationally attractive, they can sometimes produce implausible results as they do not take into account the fact that certain values of x may be impossible [1]. Thus, in a laboratory setting, it is reasonable to ask what the best possible estimator of x would be given the population spike data. The best unbiased estimator (linear or nonlinear) is given by the maximum a posteriori (MAP) estimate of x, and such an estimator will be able to extract more information than any linear estimator [49–51]. The MAP estimate is defined as the peak of the posterior distribution P(x|{ni}) of x given the firing pattern over all cells during the time interval Dt. The MAP estimate is usually found empirically by observing the complete pattern of population activity {ni} for many different values of x, and forming an estimate of the conditional probability P({ni}|x). Estimates are also needed for both the prior probability P({ni}) of each spike pattern and the prior probability P(x) of each value of x. Bayes’ rule is then used to calculate Pðxjfni gÞ ¼ P(x)P({ni}|x)/P({ni}), and the MAP estimate of x is the value that maximizes the posterior distribution P(x|{ni}) www.current-opinion.com
The difficulty with either the MAP or the ML approach is that the number of possible patterns of activity {ni} is very large and the probability of observing any particular pattern is very small. Thus, estimation of P({ni}|x) requires a tremendous number of observations. Several authors have introduced additional assumptions that allow easier estimation of P({ni}|x). The most common assumption is that the firing of each cell, when conditioned on the input, is independent of the firing of other cells. Thus P(ni|x) and P(nj|x) are assumed to be independent (given x) when i6¼j. This assumption is justified by the Poisson model detailed above, as it is equivalent to stating that the spike generators of the two cells are independent once the average firing rate is specified. In this case, considerable simplification is possible by noting that Pðfni gjxÞ ¼ P P(ni|x). In the particular case given by equation (1) above, for each cell PðnjsÞ ¼ ðsDtÞn eðsDtÞ =n! and for the population: si ðxÞDt e (2) Pðfni gjxÞ ¼ PPðni jxÞ ¼ P Pðsi ðxÞDtÞni ni ! where si(x) is the firing probability for a cell i as a function of x, and therefore gives the receptive field or response function for that cell. From the last term of this equation, we see that the conditional probability is formed from products of the receptive fields of the cells that fired, in which each receptive field is raised to the power of the number of times the cell fired in the measured time interval. The ML estimate of x is the value of x that maximizes the probability for the particular observed spike counts ni (see Figure 7). As the goal is to find the maximum likelihood value xML that maximizes this function, we can remove terms that are independent of x and take the logarithm to obtain: hX i ðni logðsi ðxÞÞ si ðxÞÞ (3) xML ¼ argmax If an estimate of the prior density P(x) is available, then we can compute the MAP estimate as: h i X xMAP ¼ argmax logðPðxÞÞþ ðni logðsi ðxÞÞsi ðxÞÞ (4) This Bayesian estimate is the best possible estimate of x on the basis of measurement of the spike counts ni within a short time interval. Improved estimates can be made for slowly-changing values of x by making multiple measurements over time and incorporating additional known constraints on the system, such as noise properties or smoothness [1,54]. Detailed comparisons between Bayesian and linear methods have been made on data from the primary visual cortex [9] and from the hippocampus [1]. Current Opinion in Neurobiology 2003, 13:238–249
244 Cognitive Neuroscience
Figure 7
(a)
Tuning curve 1
(b)
Tuning curve 2
(c)
ML estimate
p(n1,n2|x)
s2(x)
s1(x)
xest
x
x
x Current Opinion in Neurobiology
Illustration of calculating the maximum likelihood (ML) estimate from a population of cells. The best estimate is the value xest that maximizes the probability of observing the particular pattern of spikes p(n1,n2|x). As described in the text, under certain circumstances this probability is related to the product of the tuning curves s1(x)s2(x) for the set of cells that fired.
The discussion so far has focused on the representation of single scalar or vector quantities by populations of neurons. It should be noted, however, that populations may also be able to represent functions or probability densities, and that the representation of probability densities can have important uses for neural computation or estimation [19]. In particular, estimation of a probability density allows a population to represent not only the value of an encoded variable but also the confidence in that value [19,36,55]. Estimation of the probability density also allows a population to represent the possibility of multiple alternative values or targets [14]. If the neural activity is described by a dynamic system, it is possible to estimate the probability density at each point in time [54,55]. Linear reconstruction models, including the population vector method, have been proposed as ‘biologically plausible’ methods for the brain to accomplish this type of estimation. There are also several plausible networks that can reconstruct the optimal ML estimate of an encoded variable. These networks operate by finding the peak of the posterior probability distribution [1,32,36,37], the peak of a derived function obtained by an iterative nonlinear smoothing operation [51,52,56], or the nearestneighbor match to an expected pattern of population response [12]. A smaller number of such networks are able to extract the ML estimate over very short time windows [12,32], which eliminates the time-consuming and biologically implausible need to count spikes and estimate average spike rates.
Computation in population codes Computation in the brain involves translations from one internal representation into another. For example, a sensory representation that indicates a target for movement must be translated into the motor representation of the Current Opinion in Neurobiology 2003, 13:238–249
pattern of muscle activity that will achieve that target [17,57,58]. There are two basic goals of computation: first, the transformation of one function of the input into another, and second, the combination of information from multiple sources [59]. In both cases, we seek to understand how such computations can be performed when data is represented in population codes. For example, if the input population si(x) represents data from a muscle-stretch receptor or the retina, then transformation might be needed to convert this information to a new neural tuning function sj(x) that is appropriate for driving a motor neuron [60]. Additional information might be needed to calculate the correct drive to the motor neuron, so the desired function might also need a combination of proprioceptive inputs from multiple muscles to compute a function sj(x1,x2) that can compensate for interaction forces. Population codes are well-suited to both transformation and combination of inputs [53]. Because the behavior of a population is determined by the behavior of its individual cells, any transformation or combination between population codes can be described by the tuning functions sj(x) of the target population. In fact, if an input population is the sole determinant of activity in an output (target) population, then the tuning curve of every cell in the target population can be described as a function of the spike pattern on the input population. The ability to perform particular operations is then determined by the set of functions sj(x) that can be computed from the spike data of the input population (see Figure 8). The average behavior of the target population cells will be determined by the average input firing probability, which is given by the tuning functions si(x) of the input population. This means that the tuning of cells in the target population can be described as functions of the input population tuning functions. For example, linear www.current-opinion.com
Neural population codes Sanger 245
Figure 8
Input population
[1,37,56,67,68]. But whether or not such arrangements exist is a matter of speculation. Linear combination
Target population s4(x)
Spike generator
n4(t)
+
s5(x)
Spike generator
n5(t)
+
s6(x)
Spike generator
n6(t)
n1(t)
+
n2(t)
n3(t)
Current Opinion in Neurobiology
Computation in population codes. The tuning curves of the target population are linear combinations of the spikes in the input population.
combinations of a group of cells with differing receptive fields si(x) can form a neural P network that calculates new aij si ðxÞ [32]. Networks of this tuning functions sj ðxÞ ¼ type include radial-basis function networks that use tuning curves in the shape of local ‘bumps’ to compute a large class of nonlinear functions with high precision, given a sufficient number of neurons [37,52,61–64]. Such networks can also be considered to perform both linear and nonlinear operations on the variables that are represented [32,65]. Similarly, functions of multiple variables can be comP puted as sj ðx1 ; x2 ; . . .Þ ¼ aij si ðx1 ; x2 ; . . . Þ. In this way, the tuning curves of the input population can be recombined using linear or non-linear operations to yield the desired tuning curves of the output population. When there are more than a few different input variables, such networks suffer from the ‘curse of dimensionality’, which requires the number of neurons to grow exponentially as the number of input dimensions increases, in order to compute arbitrary functions with fixed accuracy. There are several solutions to the curse of dimensionality in such cases. One solution is to choose cells si(x1,x2,. . .) that respond only to values of x1,x2. . . that occur commonly. Another is to use a class of receptive field functions that are separable in the input dimensions siðx1 ; x2 ; . . .Þ ¼ ui (x1)vi(x2). . . so that highdimensional linear combinations can be approximated by polynomials over the set of one-dimensional input functions ui(x1). . . [66,67]. Although distributed population coding can perform such computations in theory [58], in practice we do not know which computations are performed nor how they are accomplished. Populations of spiking neurons can certainly be arranged so that all the summation, polynomial, or other nonlinear operations needed are implemented www.current-opinion.com
An important issue for computation using population codes that has rarely been explored is the choice of the cell-tuning functions or receptive fields. In many studies of angular variables (such as arm movement direction), the tuning functions are assumed to be cosines. However, cosine tuning means that the cell responses are linear in the direction of movement [45], so linear combinations of these tuning curves can form only linear functions. In theoretical studies involving Bayesian estimation, tuning functions are often taken to be Gaussian. But when Gaussian functions are used in a Bayesian framework they are usually multiplied together, and products (but not sums) of Gaussians remain Gaussian [19] and are thus unable to represent multimodal or other complex functions. The choice of tuning curve set for theoretical analyses or neural computation can therefore have a tremendous impact on the class of functions that can be approximated or variables that can be extracted. In particular, the use of cosine tuning for linear computation or Gaussian tuning for Bayesian estimation may severely limit the computational power of the network.
Optimal representations If we assume that a population code is optimal, then we can ask ‘optimal for what?’ [29,71]. Understanding optimality criteria within populations may be helpful for determining both their purpose and limitations to their function. For example, some authors have attempted to determine whether motor cortical cells are more closely related to intrinsic coordinates (such as joint angles) or extrinsic coordinates (such as Cartesian components of hand position) [23,24,72], and whether they are more related to motor planning or execution [73]. If the population code is a better representation of one set of co-ordinates than another, this could be helpful information in answering such questions. If we believe that we know the purpose of a population yet it does not seem to be optimized, then we can seek additional constraints that have prevented optimality. Certainly, different widths or shapes of tuning curves can have significant effects on the accuracy of representations [21], and in some formulations this effect depends crucially on the dimension of the data [69] or features of the noise [70]. It is often pertinent to determine whether or not a particular variable is coded within a population at all. However, the fact that a variable can be extracted from a population does not mean that the variable is relevant for biological computation. In particular, given the large number of cells that usually participate in population estimates, it is important to realize that the ability to extract a particular variable x from a population does not necessarily indicate the importance of that variable in determining the population response [16,25,36,45]. For Current Opinion in Neurobiology 2003, 13:238–249
246 Cognitive Neuroscience
example, the fact that we can extract motion from the responses of a population of photoreceptors does not imply that the purpose of the photoreceptors is (only) to represent motion. Most research into this issue has examined whether or not there is a correlation between changes in cell firing and changes in a set of external variables x [26,72,74]. An alternative approach has been to determine the coordinate system that makes the celltuning curves separable in terms of the measured variables. For instance, tuning for movement direction can be made relatively independent of tuning for initial arm position if the arm position is specified in joint coordinates rather than in the Cartesian coordinates of the location of the hand [29]. (Note that as joint angles uniquely specify the position of the hand [but not vice versa], the assumption of Cartesian coordinate tuning for hand position would have reduced the maximum information extractable from the population.)
which is helpful when evaluating the quality of estimates of continuous variables extracted from the population [12,50,75,76,77]. Careful choice of the representation of z may lead to a significant improvement in mutual information. For a given population with spike probabilities P({ni}|x) and for a given noise model, it may be possible to determine the variable z(x) that maximizes I(z, {ni}). Similarly, for a given variable z(x) and a given noise model, it may be possible to determine the receptive fields si(x), such that the population representation described by Poisson cells satisfying equation 1 maximizes I(z,{ni}) [32]. In general, optimal representations will increase the number of cells and decrease the width of the tuning curves si(x) in regions where z(x) has high variability, and they will decrease the number of cells and increase the width of the tuning curves in regions where z(x) is relatively constant [22,32]. (An example of the choice of optimal tuning curves for simulated random data is shown in Figure 9). The converse of this is that variables are well represented for ranges of values that excite many cells in the population, whereas they are more poorly represented for ranges of values that excite only a few cells [21]. Best use of the population representation dictates that the overall entropy (which is related to the variability of the spike patterns) is high,
Methods for determining coded variables can be generalized by choosing variables which maximize the mutual information I(z,{ni}) between the variable of interest z and the spike pattern {ni} in the population. Note that the variable of interest may be a function of the input data, in which case it is z(x). The Shannon mutual information is helpful when considering the spike pattern as a binary vector, and it is related to the Fisher mutual information, Figure 9
1
0.8
0.6
0.4
0.2
0 –5
–4
–3
–2
–1
0
1
2
3
4
5
Current Opinion in Neurobiology
Optimal tuning curves. The short vertical lines represent simulated data from a random distribution. The optimal tuning curves are shown above. The optimal curves are located in regions of high data density, and they must be narrower as the data density increases. Horizontal axis is the value of the input variable x; vertical axis is the cell response as a fraction of the maximal firing rate. Current Opinion in Neurobiology 2003, 13:238–249
www.current-opinion.com
Neural population codes Sanger 247
so that many different possible patterns of activity are used. This generally means that the average spike rate for all cells is similar, and different cells fire with very different tuning curves [32,49,76,77,78]. These are related to the competing requirements of ‘coverage’ and ‘continuity’ of neural representations [71,79]. Choice of tuning curve can affect the tolerance of the system to different types of noise, and in some cases this effect may determine the optimal tuning curves [80].
Conclusions Although population representations appear to be ubiquitous in neural systems, different brain regions have evolved to perform different specific tasks, and each region of the brain may perform its computations in very different ways. Nevertheless, an understanding of the fundamental properties of populations of spiking neurons may allow us to interpret population activity and hence to control prosthetic devices [34], and it may allow us to gain insight into the limitations and abilities of ongoing computation in the brain. Future research is needed in order to understand population codes that represent more than one variable, and the results of such research will be helpful in understanding how the brain can efficiently represent many different variables simultaneously. In order to control realistic prosthetic devices, more research will be needed to discover reliable and accurate algorithms for extremely rapid extraction of data from population codes. Hopefully, as we learn more about the normal function of population codes, we will also be able to use this type of model to understand the behavior of the injured brain. If portions of population codes are missing or injured due to stroke, degenerative disease, or congenital malformations, we may be able to relate failure of the population to particular clinical symptoms, and perhaps this knowledge will allow us to discover new treatments for human neurological disease.
Acknowledgements The author was supported during this research by the Stanford University Department of Neurology, and by grant K23-NS41243 from the National Institute of Neurological Disorders and Stroke.
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as: of special interest of outstanding interest
cortical population coding of movement direction. J Neurosci 1999, 19:8083-8093. 6.
Fetz EE: Neuroscience: temporal coding in neural populations? Science 1997, 278:1901-1902.
7.
deCharms RC: Information coding in the cortex by independent or coordinated populations. Proc Natl Acad Sci USA 1998, 95:15166-15168.
8.
Buonomano DV, Merzenich MM: Temporal information transformed into a spatial code by a neural network with realistic properties. Science 1995, 267:1028-1030.
9.
Oram MW, Foldiak P, Perrett DI, Sengpiel F: The ‘Ideal Homunculus’: decoding neural population signals. Trends Neurosci 1998, 21:259-265.
10. Wu S, Nakahara H, Amari S: Population coding with correlation and an unfaithful model. Neural Comput 2001, 13:775-797. The authors show that the assumption of a lack of correlation between spike generators from different cells does not lead to a significant loss of decoding accuracy under reasonable assumptions on the correlations. 11. Chawla D, Lumer ED, Friston KJ: The relationship between synchronization among neuronal populations and their mean activity levels. Neural Comput 1999, 11:1389-1411. 12. Panzeri S, Treves A, Schultz S, Rolls ET: On decoding the responses of a population of neurons from short time windows. Neural Comput 1999, 11:1553-1577. 13. Cabel DW, Cisek P, Scott SH: Neural activity in primary motor cortex related to mechanical loads applied to the shoulder and elbow during a postural task. J Neurophysiol 2001, 86:2102-2108. This study provides another example of tuning curves that respond to multiple variables, and the complexity of the resulting interactions. In this case, the response to the two variables is a linear superposition of the response to each variable independently. 14. Cisek P, Kalaska JF: Simultaneous encoding of multiple potential reach directions in dorsal premotor cortex. J Neurophysiol 2002, 87:1149-1154. Probabilistic and Bayesian models predict that population codes can represent the simultaneous possibility of more than one value of a variable in the form of a probability distribution over x. This study provides a demonstration that cells in the dorsal premotor cortex can simultaneously represent the possibility of two different directions of movement, with subsequent selection of a single direction being made only immediately prior to movement onset. Similar findings occur in the oculomotor system. 15. Hoshi E, Shima K, Tanji J: Task-dependent selectivity of movement-related neuronal activity in the primate prefrontal cortex. J Neurophysiol 1998, 80:3392-3397. 16. Sergio LE, Kalaska JF: Changes in the temporal pattern of primary motor cortex activity in a directional isometric force versus limb movement task. J Neurophysiol 1998, 80:1577-1583. 17. Zhang J, Riehle A, Requin J, Kornblum S: Dynamics of single neuron activity in monkey primary motor cortex related to sensorimotor transformation. J Neurosci 1997, 17:2227-2246. 18. Gandolfo F, Li C, Benda BJ, Schioppa CP, Bizzi E: Cortical correlates of learning in monkeys adapting to a new dynamical environment. Proc Natl Acad Sci USA 2000, 97:2259-2263. 19. Zemel RS, Dayan P, Pouget A: Probabilistic interpretation of population codes. Neural Comput 1998, 10:403-430.
1.
Zhang KC, Ginzburg I, McNaughton BL, Sejnowski TJ: Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. J Neurophysiol 1998, 79:1017-1044.
20. Turner RS, Anderson ME: Pallidal discharge related to the kinematics of reaching movements in two dimensions. J Neurophysiol 1997, 77:1051-1074.
2.
Eliasmith C, Anderson CH: Neural Engineering. Cambridge, MA: MIT Press; 2003.
21. Lewis JE, Kristan WB: Representation of touch location by a population of leech sensory neurons. J Neurophysiol 1998, 80:2584-2592.
3.
Dayan P, Abbott LF: Theoretical Neuroscience. Cambridge, MA: MIT Press; 2001.
4.
Rieke F, Warland D, de Ruyter van Steveninck, Bialek W: Spikes: Exploring the Neural Code. Cambridge, MA: MIT Press, 1997.
5.
Maynard EM, Hatsopoulos NG, Ojakangas CL, Acuna BD, Sanes JN, Normann RA, Donoghue JP: Neuronal interactions improve
www.current-opinion.com
22. Eurich CW, Wilke SD: Multidimensional encoding strategy of spiking neurons. Neural Comput 2000, 12:1519-1529. 23. Scott SH, Sergio LE, Kalaska JF: Reaching movements with similar hand paths but different arm orientations. 2. Activity of individual cells in dorsal premotor cortex and parietal area 5. J Neurophysiol 1997, 78:2413-2426. Current Opinion in Neurobiology 2003, 13:238–249
248 Cognitive Neuroscience
24. Scott SH, Kalaska JF: Reaching movements with similar hand paths but different arm orientations. 1. Activity of individual cells in motor cortex. J Neurophysiol 1997, 77:826-852.
45. Sanger TD: Theoretical considerations for the analysis of population coding in motor cortex. Neural Comput 1994, 6:12-21.
25. Scott SH: Population vectors and motor cortex: neural coding or epiphenomenon? Nat Neurosci 2000, 3:307-308.
46. Lin SM, Si J, Schwartz AB: Self-organization of firing activities in monkey’s motor cortex: trajectory computation from spike signals. Neural Comput 1997, 9:607-621.
26. Steinberg O, Donchin O, Gribova A, Cardosa de Oliveira S, Bergman H, Vaadia E: Neuronal populations in primary motor cortex encode bimanual arm movements. Eur J Neurosci 2002, 15:1371-1380. 27. Caminiti R, Johnson PB, Urbano A: Making arm movements within different parts of space: dynamic aspects in the primate motor cortex. J Neurosci 1990, 10:2039-2058. 28. Ashe J: Force and the motor cortex. Behav Brain Res 1997, 87:255-269.
47. Bergenheim M, RibotCiscar E, Roll JP: Proprioceptive population coding of two-dimensional limb movements in humans: I. Muscle spindle feedback during spatially oriented movements. Exp Brain Res 2000, 134:301-310. 48. Roll JP, Bergenheim M, RibotCiscar E: Proprioceptive population coding of two-dimensional limb movements in humans: II. Muscle-spindle feedback during ‘drawing-like’ movements. Exp Brain Res 2000, 134:311-321.
29. Ajemian R, Bullock D, Grossberg S: Kinematic coordinates in which motor cortical cells encode movement direction. J Neurophysiol 2000, 84:2191-2203.
49. Rolls ET, Treves A, Tovee MJ: The representational capacity of the distributed encoding of information provided by populations of neurons in primate temporal visual cortex. Exp Brain Res 1997, 114:149-162.
30. Amirikian B, Georgopoulos AP: Directional tuning profiles of motor cortical cells. Neurosci Res 2000, 36:73-79.
50. Brunel N, Nadal JP: Mutual information; Fisher information; and population coding. Neural Comput 1998, 10:1731-1757.
31. Kakei S, Hoffman DS, Strick PL: Muscle and movement representations in the primary motor cortex. Science 1999, 285:2136-2139.
51. Deneve S, Latham PE, Pouget A: Reading population codes: a neural implementation of ideal observers. Nat Neurosci 1999, 2:740-745.
32. Sanger TD: A probability interpretation of neural population coding for movement. In Self-organization, Computational Maps, and Motor Control. Edited by Morasso P, Sanguineti V. North Holland: Elsevier Science; 1997:75-116.
52. Pouget A, Zhang K, Deneve S, Latham PE: Statistically efficient estimation using population coding. Neural Comput 1998, 10:373-401.
33. Andersen RA, Essick GK, Siegel RM: Encoding of spatial location by posterior parietal neurons. Science 1985, 230:456-458. 34. Schwartz AB, Taylor DM, Tillery SIH: Extraction algorithms for cortical control of arm prosthetics. Curr Opin Neurobiol 2001, 11:701-707. 35. Wessberg J, Stambaugh CR, Kralik JD, Beck PD, Laubach M, Chapin JK, Kim J, Biggs SJ, Srinivasan MA, Nicolelis MA: Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 2000, 408:361-365. 36. Sanger TD: Probability density estimation for the interpretation of neural population codes. J Neurophysiol 1996, 76:2790-2793. 37. Sanger TD: Probability density methods for smooth function approximation and learning in populations of tuned spiking neurons. Neural Comput 1998, 10:1567-1586. 38. Salinas E, Abbott LF: Vector reconstruction from firing rate. J Comput Neurosci 1994, 1:89-108. 39. Moran DW, Schwartz AB: Motor cortical activity during drawing movements: population representation during spiral tracing. J Neurophysiol 1999, 82:2693-2704. 40. Moran DW, Schwartz AB: Motor cortical representation of speed and direction during reaching. J Neurophysiol 1999, 82:2676-2692. 41. Georgopoulos AP, Kettner RE, Schwartz AB: Primate motor cortex and free arm movements to visual targets in threedimensional space II. Coding of the direction of arm movement by a neural population. J Neurosci 1988, 8:2928-2937. 42. Schwartz AB, Kettner RE, Georgopoulos AP: Primate motor cortex and free arm movements to visual targets in threedimensional space I. Relation between single cell discharge and direction of movement. J Neurosci 1988, 8:2913-2927.
53. Snippe HP: Parameter extraction from population codes: a critical assessment. Neural Comput 1996, 8:511-529. 54. Brown EN, Frank LM, Tang D, Quirk MC, Wilson MA: A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. J Neurosci 1998, 18:7411-7425. 55. Twum-Danso N, Brockett R: Trajectory estimation from place cell data. Neural Networks 2001, 14:835-844. This elegant mathematical analysis extends Bayesian estimation methods to time-varying systems using the theory of stochastic differential equations for diffusion processes. The solid mathematical basis provides the ability to relate estimation methods to linear techniques such as the Kalman filter, and it will provide the background for extending optimal population decoding to trajectories in time-varying systems. 56. Deneve S, Latham PE, Pouget A: Efficient computation and cue integration with noisy population codes. Nat Neurosci 2001, 4:826-831. 57. Shen LM, Alexander GE: Neural correlates of a spatial sensoryto-motor transformation in primary motor cortex. J Neurophysiol 1997, 77:1171-1194. 58. Baraduc P, Guigon E, Burnod Y: Recoding arm position to learn visuomotor transformations. Cereb Cortex 2001, 11:906-917. A computational neural network model is presented that combines information on current position and target of a movement by multiplying the two values together. An adaptive network then learns to take the combined population information and create a new population that represents the appropriate motor command. This is a concrete example of combining multiple input populations to produce an output population with desired behavior. 59. Burnod Y, Baraduc P, Battaglia-Mayer A, Guigon E, Koechlin E, Ferraina S, Lacquaniti F, Caminiti R: Parieto-frontal coding of reaching: an integrated framework. Exp Brain Res 1999, 129:325-346.
43. Scott SH, Gribble PL, Graham KM, Cabel DW: Dissociation between hand motion and population vectors from neural activity in motor cortex. Nature 2001, 413:161-165. The population vector for cells in primary motor cortex is shown to be a poor predictor of hand movement direction due to lack of uniformity in the distribution of preferred tuning directions, as predicted by theoretical analyses [44,45].
60. Fukushima K, Yamanobe T, Shinmei Y, Fukushima J, Kurkin S, Peterson BW: Coding of smooth eye movements in threedimensional space by frontal cortex. Nature 2002, 419:157-162. The authors show that neurons in the frontal eye field respond to both lateral motion and motion in depth, thereby coding 3D movement. This is an example of coding of multiple co-ordinates by the same population, and the authors propose that it may be helpful for controlling eye movements that are related to three-dimensional movements of the hand in space.
44. Mussa-Ivaldi S: Do neurons in the motor cortex encode movement direction? An alternative hypothesis. Neurosci Lett 1988, 91:106-111.
61. Sanger TD: Optimal hidden units for two-layer nonlinear feedforward neural networks. Int J Pattern Recogn 1991, 5:545-561.
Current Opinion in Neurobiology 2003, 13:238–249
www.current-opinion.com
Neural population codes Sanger 249
62. Powell MJD: Radial basis functions for multivariable interpolation: a review. In Algorithms for Approximation. Edited by Mason JC, Cox MG. Oxford: Clarendon Press; 1987:143-167. 63. Broomhead DS, Lowe D: Multivariable functional interpolation and adaptive networks. Complex Systems 1988, 2:321-355. 64. Klopfenstein RW, Sverdlove R: Approximation by uniformly spaced Gaussian functions. In Approximation Theory IV. Edited by Chui CK, Schumaker LL, Ward JD. New York: Academic Press; 1983:575-580. 65. Baraduc P, Guigon E: Population computation of vectorial transformations. Neural Comput 2002, 14:845-871. This investigation of the representational capacity of populations concludes that cosine tuning is appropriate if preferred directions are nonuniformly distributed, but that other forms of tuning may be appropriate if preferred directions are uniformly distributed. 66. Sanger TD: A tree-structured algorithm for reducing computation in networks with separable basis functions. Neural Comput 1991, 3:67-78. 67. Sanger TD: A tree-structured adaptive network for function approximation in high dimensional spaces. IEEE Trans Neural Networks 1991, 2:285-293. 68. Sanger TD, Sutton R, Matheus C: Iterative construction of sparse polynomial approximations. In Advances in Neural Information Processing Systems. Moody JE, Hanson SJ, Lippmann RP. San Mateo: Morgan Kaufmann; 1992. 69. Pouget A, Deneve S, Ducom J-C: Narrow versus wide tuning curves: what’s best for a population code? Neural Comput 1999, 11:85-90. 70. Zhang K, Sejnowski T: Neuronal tuning: to sharpen or broaden? Neural Comput 1999, 11:75-84. 71. Carreira-Perpinan MA, Goodhill GJ: Are visual cortex maps optimized for coverage? Neural Comput 2002, 14:1545-1560. It has been proposed that a population can be considered optimal for coding a particular representation if any small change in the population would worsen the quality of the coding. The authors argue that such methods do not provide adequate evidence for optimality of a population. 72. Reina GA, Moran DW, Schwartz AB: On the relationship between joint angular velocity and motor cortical discharge during reaching. J Neurophysiol 2001, 85:2576-2589. This paper follows in the long tradition of using linear correlation based methods in an attempt to determine which of several co-ordinate systems is most closely related to a neural representation. The authors compare whether joint angles or hand positions in 3D space are better linear predictors of cell firing in motor cortex, and whether population vectors based on joint angles or hand positions are better predictors of the actual
www.current-opinion.com
motion trajectories. They found that both coordinate systems were well correlated, but the joint angles that were more closely linked to hand position were better predictors of cell firing than those that were less linked to hand position. Although this might seem to imply that the motor cortical representation has been optimized for representing hand position, it is difficult to draw such conclusions from a purely linear analysis unless we assume that the brain itself is limited to linear analytic techniques. A Bayesian or information-theoretic analysis might well lead to very different conclusions. 73. Crammond DJ, Kalaska JF: Prior information in motor and premotor cortex: activity during the delay period and effect on pre-movement activity. J Neurophysiol 2000, 84:986-1005. 74. Gomez JE, Fu Q, Flament D, Ebner TJ: Representation of accuracy in the dorsal premotor cortex. Eur J Neurosci 2000, 12:3748-3760. 75. Wu S, Amari S, Nakahara H: Population coding and decoding in a neural field: a computational study. Neural Comput 2002, 14:999-1026. This study investigates the effect on Fisher information of introducing local correlations into the spike generators of a population of cells. The information increases both for very low and very high correlations. The authors point out some pitfalls of using maximum-likelihood estimators in this type of population. 76. Abbott LF, Dayan P: The effect of correlated variability on the accuracy of a population code. Neural Comput 1999, 11:91-101. 77. Wilke SD, Eurich CW: Representational accuracy of stochastic neural populations. Neural Comput 2002, 14:155-189. The authors investigate the effect of the dimensionality of the input data x on the Fisher information for populations that have Poisson-type noise, additive noise, or local correlations in spike generators. Additive noise has lower reconstruction error for high input dimensionality, but it leads to worse reconstruction for large populations when there are local correlations. They derive the important result that populations with highly varied tuning curves may have better representational properties than populations in which all tuning curves share a similar shape. 78. Poliakov AV, Schieber MH: Limited functional grouping of neurons in the motor cortex hand area during individuated finger movements: a cluster analysis. J Neurophysiol 1999, 82:3488-3505. 79. Hubel DH, Wiesel TN: Functional architecture of the macaque monkey visual cortex. Proc R Soc Lond Ser B 1977, 198:1-59. 80. Todorov E: Cosine tuning minimizes motor errors. Neural Comput 2002, 14:1233-1260. This is one of a series of papers in which the authors examine the role of signal-dependent noise and predict that the properties of tuning curves may be determined by an attempt to minimize the effects of noise.
Current Opinion in Neurobiology 2003, 13:238–249