The Value of Persistent Value

The Value of Persistent Value

Neuron Previews The Value of Persistent Value Frederic M. Stoll1,* and Peter H. Rudebeck1,* 1Nash Family Department of Neuroscience and Friedman Brai...

548KB Sizes 0 Downloads 124 Views

Neuron

Previews The Value of Persistent Value Frederic M. Stoll1,* and Peter H. Rudebeck1,* 1Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA *Correspondence: [email protected] (F.M.S.), [email protected] (P.H.R.) https://doi.org/10.1016/j.neuron.2019.08.018

In this issue of Neuron, Bari et al. (2019) show that neurons in medial frontal cortex, but not a nearby premotor area, encode the relative value of available options with long-lasting persistent activity states during naturalistic foraging. These long-lasting activity states serve to preferentially guide choices to more likely rewarded options. From arctic whales remembering where they will likely find a hole in the ice to breathe to drowsy students figuring out the best place to get free coffee, foraging takes many guises. The question of how humans and other animals adaptively solve this type of problem to meet their vital needs or superfluous wants has occupied researchers in many fields from economics to neuroscience. Across disciplines, there is agreement that in order to maximize their chances of success, all animals must remember and track the relative difference in value of the available alternatives. Using this, they can then select the best course of action. Prior work on naturalistic foraging in humans and other animals has identified medial frontal cortex (MFC) as being a key brain structure. Lesions of MFC diminish the ability of animals to adaptively select the most advantageous option during foraging (Kennerley et al., 2006). In humans and monkeys, parts of MFC track the average value of available options during foraging (Amiez et al., 2006; Kolling et al., 2012). However, the specific patterns of neural activity that are engaged in MFC to maintain the current relative value of options is unclear, but a recent report by Bari and colleagues has started to shed light on this issue (Bari et al., 2019). Bari and colleagues trained mice in a classic dynamic matching task that has been extensively studied across multiple species (Gallistel, 1990). On each trial, mice could lick one of two ports based on the probability with which they might receive a fluid reward. Over the course of testing sessions lasting hundreds of trials, the probability of receiving fluid on each spout changed every 40–100 trials. To choose adaptively, subjects had to contin-

ually update the probability of receiving a reward on each spout. Analyses of subjects’ behavior showed that indeed choices and response times were strongly related to reward history. Then, by fitting a reinforcement learning model to the animals’ behavior, the authors estimated the value of each spout—the probability of receiving a reward—on every trial. This allowed them to compute the relative and total values of the two options. Relative value is known to bias choices and total value correlated with the speed at which choices were made, a proxy of the animal’s motivational state (Wang et al., 2013). When MFC was inactivated using a potent GABA agonist, subjects had difficulty tracking which option would likely deliver a reward, and response times increased. In control tasks that did not require mice to remember option values across trials, MFC inactivation had no effect. This confirmed the specificity of MFC for remembering and tracking value over time. But how are aspects of value encoded in MFC? To answer this, Bari et al. (2019) recorded the activity of more than 3,000 neurons in MFC of their mice. To look for correlates of relative and total value, they analyzed neural responses in the period just before a new trial began, reasoning that this would be when the animals would have updated their valuations of the two options based on the outcome of the previous trial. Interestingly, they found two partially overlapping populations of neurons exhibiting long-lasting firing rate modulations specifically related to the trial-by-trial estimates of either relative or total value (Figure 1). The activity of neurons preferentially encoding relative value predicted choice probability, but not response

times. The opposite was true for neurons preferentially encoding total value. Importantly, firing rate modulations of both of these types of neurons were independent of motor responses. Finding correlates of value in MFC is not so surprising given prior work. What is surprising, however, is how these two variables were maintained over time. To investigate this further, Bari et al. (2019) leveraged the fact that trials were separated by a random length inter-trial interval (ITI). Analyzing how this impacted encoding revealed that neural activity related to relative value and choice probabilities was highly stable over time, at least on the timescales measured; variable ITI made little difference to encoding and subsequent choices. By contrast, activity related to total value and response times were influenced by the variable ITIs. Longer ITIs were associated with decreasing total value representations and a diminished relationship to subsequent response times. Thus, these two populations have markedly different timescales over which taskrelevant information is maintained, and this has a direct relationship to behavior. Ultimately, this difference in timescales could be related to differences in the nature of information encoded: choice probabilities need to be remembered until the next choice, whereas motivation will decrease as long ITIs decrease reward rates. To test the specificity of their findings, recordings in anterior lateral motor cortex (ALM), a premotor region adjacent to MFC, were also made. Unlike MFC, representations of relative and total value were not as robust or sustained over time. This indicates that MFC activity might not modulate ALM responses directly to influence action selection. If not directly, it is possible that such

Neuron 103, September 4, 2019 ª 2019 Elsevier Inc. 757

Neuron

Previews

information might be relayed distinct functional timeby MFC projections to the scales are a common mechdorsal striatum—another area anism that supports multiple involved in action selection. cognitive domains. DeterIndeed, pathway-specific inacmining how populations of tivation of MFC projections neurons generate these to dorsal striatum produced persistent but flexible activchanges in choice behavior ity patterns in order to mainand reaction times similar tain information over time to inactivating the whole will likely be invaluable for MFC. Using photo tagging understanding the mechaand collision test techniques, nism of cognition, not just Bari et al. (2019) then foraging. showed that neurons projecting from MFC to dorsal Figure 1. In Foraging Mice, MFC Neurons Encode Relative and Total REFERENCES striatum were highly likely to Values of Available Options with Different Timescales encode either relative or total Amiez, C., Joseph, J.P., and value with persistent activity. Procyk, E. (2006). Reward encoding Thus, this might be the critical pathway MFC is especially exciting as the idea in the monkey anterior cingulate cortex. Cereb. through which relative and total value that different areas signal and maintain in- Cortex 16, 1040–1055. signals in MFC come to bias action formation through persistent activity with Bari, B.A., Grossman, C.D., Lubin, E.E., selection and invigorate responding, different timescales has been of intense Rajagopalan, A.E., Cressy, J.I., and Cohen, J.Y. (2019). Stable representations of decision variinterest recently (Murray et al., 2014). ables for flexible behavior. Neuron 103, this issue, respectively. Beyond projections to striatum and Due to their intrinsic physiological proper- 922–933. ALM, where else in the brain are relative ties, some neurons have short timescales Gallistel, C.R. (1990). The Organization of Learning value signals persistently encoded? Using in that the temporal correlation of their ac- (The MIT Press). two-photon imaging, Hattori and col- tivity decays quickly. This is the case in Hattori, R., Danskin, B., Babic, Z., Mlynaryk, N., leagues recently monitored the activity of sensory cortex and is a feature critical and Komiyama, T. (2019). Area-specificity and thousands of neurons across motor, sen- for precisely representing dynamic sen- plasticity of history-dependent value coding during sory, and association cortex in mice sory stimuli for accurate perception. By learning. Cell 177, 1858–1872.e15. engaged in a matching task similar to that contrast, neurons with longer timescales Kennerley, S.W., Walton, M.E., Behrens, T.E., used here (Hattori et al., 2019). While the could be critical to integrate information Buckley, M.J., and Rushworth, M.F. (2006). Optimal decision making and the anterior cingulate neural activity related to relative value de- over greater periods of time. In primate cortex. Nat. Neurosci. 9, 940–947. cayed quickly in motor and sensory areas, cortex, neurons with the longest timeretrosplenial cortex, a posterior medial scales have been localized to frontal re- Kolling, N., Behrens, T.E.J., Mars, R.B., and Rushworth, M.F.S. (2012). Neural mechanisms of cortical area important for spatial and gions, especially MFC. The present data foraging. Science 336, 95–98. contextual memories, exhibited activity suggest that this property might be Murray, J.D., Bernacchia, A., Freedman, D.J., highly similar to that seen in MFC. Specif- conserved across species as the persis- Romo, R., Wallis, J.D., Cai, X., Padoa-Schioppa, tent activity in MFC reported here is indicically, neurons in retrosplenial cortex C., Pasternak, T., Seo, H., Lee, D., and Wang, X.J. (2014). A hierarchy of intrinsic timescales persistently represented the relative and ative of long timescales. across primate cortex. Nat. Neurosci. 17, The difference in timescales between 1661–1663. total values of the different options as well as chosen value. One possibility neurons encoding relative and total F.M., Fontanier, V., and Procyk, E. (2016). open to empirical investigation is that con- values also highlights that within a single Stoll, Specific frontal neural dynamics contribute to nections from MFC drive this pattern of ac- brain area, neurons have a diverse array decisions to check. Nat. Commun. 7, 11990. tivity in retrosplenial cortex. If true, this of timescales and this may dictate their Wang, A.Y., Miura, K., and Uchida, N. (2013). The would complement work in other species functional properties. Similar to the dif- dorsomedial striatum encodes net expected rethat places MFC at the top of a ferences between relative and total turn, critical for energizing performance vigor. hierarchy when it comes to deciding how value signals, a within-area functional Nat. Neurosci. 16, 639–647. to choose (Kolling et al., 2012; Stoll dissociation has also been reported in Wasmuht, D.F., Spaak, E., Buschman, T.J., macaque dorsolateral prefrontal cortex Miller, E.K., and Stokes, M.G. (2018). Intrinsic et al., 2016). neuronal dynamics predict distinct functional The finding of robust and long-lasting neurons during working memory (Was- roles during working memory. Nat. Commun. encoding of relative and total values in muht et al., 2018). This suggests that 9, 3499.

758 Neuron 103, September 4, 2019