Neuron
Previews still remains unclear. In heterologous expression systems, TMCs generally fail to translocate to the cell surface, leaving it open whether TMCs can propagate ionic currents across the cell membrane (Kawashima et al., 2015). Only C. elegans TMC-1 was reported to give rise to ionic currents in heterologous expression systems (Chatzigeorgiou et al., 2013). Whether TMC-1 was indeed integrated into the membrane and caused these currents, however, remained questionable, and the experiments had not been reproduced by other labs so far. Wang et al. (2016) now found that also heterologously expressed TMC-1 remains stuck in intracellular compartments, consistent with the behavior reported from other TMC proteins
(Kawashima et al., 2015; Guo et al., 2016). Reconstituting TMCs in liposomes might show whether these proteins are ion channels, and given that the tmc genes of most animals still await their characterization, it seems tempting to speculate that some of them, analogous to C. elegans tmc-1, encode alkali sensor proteins. REFERENCES Chatzigeorgiou, M., Bang, S., Hwang, S.W., and Schafer, W.R. (2013). Nature 494, 95–99.
(2016). Proc. Natl. Acad. Sci. USA, 201606537, http://dx.doi.org/10.1073/pnas.1606537113. Kawashima, Y., Kurima, K., Pan, B., Griffith, A.J., and Holt, J.R. (2015). Pflugers Arch. 467, 85–94. Levin, L.R., and Buck, J. (2015). Annu. Rev. Physiol. 77, 347–362. Murayama, T., and Maruyama, I.N. (2015). J. Neurosci. Res. 93, 1623–1630. Ortiz, C.O., Etchberger, J.F., Posy, S.L., FrøkjaerJensen, C., Lockery, S., Honig, B., and Hobert, O. (2006). Genetics 173, 131–149. Roayaie, K., Crump, J.G., Sagasti, A., and Bargmann, C.I. (1998). Neuron 20, 55–67.
Dhaka, A., Uzzell, V., Dubin, A.E., Mathur, J., Petrus, M., Bandell, M., and Patapoutian, A. (2009). J. Neurosci. 29, 153–158.
Sassa, T., Murayama, T., and Maruyama, I.N. (2013). Neurosci. Lett. 555, 248–252.
Guo, Y., Wang, Y., Zhang, W., Meltzer, S., Zanini, D., Yu, Y., Li, J., Cheng, T., Guo, Z., Wang, Q., et al.
Wang, X., Li, G., Liu, J., Liu, J., and Xu, X.Z.S. (2016). Neuron 91, this issue, 146–154.
Time, Not Size, Matters for Striatal Reward Predictions to Dopamine Christopher J. Burke1 and Philippe N. Tobler1,* 1Laboratory for Social and Neural Systems Research, Department of Economics, University of Zurich, Blumlisalpstrasse 10, 8006 Zurich, Switzerland *Correspondence:
[email protected] http://dx.doi.org/10.1016/j.neuron.2016.06.029
Midbrain dopamine neurons encode reward prediction errors. In this issue of Neuron, Takahashi et al. (2016) show that the ventral striatum provides dopamine neurons with prediction information specific to the timing, but not the quantity, of reward, suggesting a surprisingly nuanced neural implementation of reward prediction errors. Learning about the potential reward associated with different cues, actions, or environments is critical for adaptive behavior and fitness enhancement. Reinforcement learning theories have postulated that reward learning proceeds by comparing predictions of future reward with actual reward received and signaling the difference between the two. This difference corresponds to a reward ‘‘prediction error,’’ which, according to suggestions from both machine learning and psychological learning theories, can be used to update previous predictions and guide actions (Figure 1A). In this view, learning occurs whenever the size of
experienced reward differs from the predicted size. A particularly influential model, temporal difference learning (Sutton, 1988), introduces a timing component to these predictions, so that prediction errors are generated not only when rewards occur with unexpected size but also at unexpected times. The elegance of combining predictions about the timing and size of reward into a single unitary feature (a scalar signal) has made the temporal difference reinforcement learning model a staple of learning and decision making research over the last decades. The well-replicated discovery that midbrain dopamine neurons in the sub-
8 Neuron 91, July 6, 2016 ª 2016 Elsevier Inc.
stantia nigra (SN) and ventral tegmental area (VTA) of primates encode signals that closely resemble reward prediction errors (Schultz et al., 1997) constitutes a biological substrate for the temporal difference learning model. Dopamine neurons show burst firing for larger or earlier than predicted reward and pause for smaller or later than predicted reward, providing a unitary error signal that could serve to update predictions about the size and time of future reward. In line with the notion that dopamine neurons facilitate adaptive approach behavior, the temporal specificity of dopamine prediction errors coincides with the timing of licking behavior
Neuron
Previews after monkeys have learned the start of each block A the time points at which juice (Figure 1B, left), resulting rewards are delivered (Fiorillo in positive prediction errors et al., 2008). when previously delayed There are several posreward became immediate sibilities for how dopamine (or previous reward quantity neurons may come to proincreased) and in negative cess prediction errors. For prediction errors when previexample, dopamine neurons ously immediate rewards could receive unitary predicwere delayed (or previous tion error signals computed reward quantity decreased). elsewhere. In line with this After training, one group possibility, regions directly or of rats received ipsilateral indirectly projecting to dopaneurotoxic lesions to the VS mine regions process reward while another group received B prediction errors, including sham lesions in the same the striatum (Oyama et al., area. After recovery, the rats 2015) and the lateral habenula again performed the task, (Matsumoto and Hikosaka, and prediction error signals 2007). Alternatively, dopawere recorded from dopamine neurons may themmine neurons in the VTA. selves compute prediction As expected, dopamine errors based on predictions, neurons in the sham-lesioned received from elsewhere, rats continued to signal about the timing and size reward prediction errors of upcoming rewards. The when the timing or quantity of ventral striatum (VS) is a major previously expected reward source of input to the VTA, was changed. By contrast, in and ventral striatal neurons VS-lesioned rats, dopamine have traditionally been proneurons continued to encode posed to provide reward preonly quantity prediction erFigure 1. Neural Implementation of Reward-Predictive Architecture dictions to dopamine neurons rors, but no longer encoded and Schematic Findings (Joel et al., 2002). In this view, timing prediction errors, irre(A) A simplified schematic of the temporal difference model implementation in the basal ganglia, adapted from Barto (1995). In the ‘‘critic,’’ the VS (or, dopamine-striatal networks spective of whether reward alternatively, the striosome; Houk et al., 1995) provides unitary predictions are thought to be organized occurred earlier or later than about the timing and quantity of expected reward to VTA. If rewards received along the lines of an actorpredicted (Figure 1B, right). do not match these predictions, a reward prediction error is emitted, modifying ‘‘actor units’’ in the dorsal striatum (or the matrix). It is also used to upcritic model (Figure 1A; Barto, Thus, dopamine neurons fired date the predictions used by the critic. When Takahashi et al. (2016) lesioned 1995). In the ‘‘critic’’ part of as if the rats expected a the VS, recordings from VTA indicated that only reward timing information was this layout, dopaminergic reward to occur at any time affected, suggesting reward quantity predictions are received by dopamine neurons in the VTA receive during the current trial. These neurons (also) from elsewhere, possibly regions outside the basal ganglia. (B) In control rats, dopamine neurons showed increased firing to rewards that inhibitory input from the VS results support the notion were bigger or occurred earlier than expected, and decreased firing when providing predictive informathat dopamine neurons are rewards were smaller or later than expected. For VS-lesioned rats, dopamine tion regarding the timing and not simply passing on fullneurons in the VTA only showed increased firing for larger, but not for earlier than expected, reward, suggesting VS critically provides reward timing insize of reward, and excitatory fledged reward prediction formation to the VTA. input (e.g., from the lateral hyerror signals received from pothalamus) when reward is elsewhere but combine prereceived. If rewards received dicted time and size informamatch expectations, the two inputs prediction information in putative dopa- tion to compute their own prediction error cancel each other out and dopaminergic mine neurons by lesioning the VS and signal. Moreover, the findings suggest neurons do not fire. If there is a mismatch, recording from VTA neurons while that the VS provides dopamine neurons the difference is transmitted as a reward behaving rats performed a task where with information about the predicted time prediction error, which is used to update reward quantity (number of drops) and of reward during learning, running counter behavioral policies and thereby guide ac- delay of reward changed unexpectedly. to the view that predictive input from the tion selection in the ‘‘actor’’ part, in addi- Specifically, rats learned to associate an striatum to the VTA forms a unitary basis tion to modifying future predictions to odor with a reward on the left or right for the computation of prediction errors drive learning in the ‘‘critic.’’ side of a behavioral chamber. The quan- (Houk et al., 1995). To account for these neurophysioIn this issue of Neuron, Takahashi and tity and timing of reward associated colleagues (2016) explore the origin of with each side unexpectedly reversed at logical findings, Takahashi et al. (2016) Neuron 91, July 6, 2016 9
Neuron
Previews develop a more nuanced reinforcement learning model that separates the representation of reward size from that of reward time (see also Contreras-Vidal and Schultz, 1999 for a model separating reward time and size information). The model treats the task as a sequence of states, with state transitions occurring due to external events (for example, the start of a trial) or through the passage of time represented internally (see also Daw et al., 2006). A fully functioning ‘‘intact’’ model showed prediction error signals consistent with dopamine signals recorded in sham-lesioned rats. When the model’s ability to learn the durations of states was degraded by limiting the resolution of temporal representations, it demonstrated absent temporal prediction error processing of unusually early or late reward, just like the dopamine neurons in the VS-lesioned rats. In other words, the model suggests that the VS allows the gating of prediction information to dopamine neurons. When a state change is expected (for example, when a reward is expected to be delivered), the VS allows temporally precise predictions to be passed on to dopamine neurons, which enables them to respond with a reward prediction error. By contrast, when the VS is lesioned, this gating no longer occurs, corresponding to constant elevated reward prediction. Note, though, that the precise nature of this gating in control rats remains speculative. One possible interpretation is that when a state transition is inferred due to the passage of time, the VS signals that the gate for predictions from elsewhere, e.g., cortex, is open. A second possibility is that the VS itself signals value expectations when the gate is open. The latter case would suggest that the temporal specificity of the predictions is lost and cannot be used by the dopamine neurons to compute prediction errors. These results are striking for two reasons. First, although the VS has been proposed as a major source of the prediction error signals typically recorded in midbrain dopamine neurons (Houk et al., 1995; Joel et al., 2002), the exact role it plays has not been fully understood. This study highlights the specific role of the VS in facilitating the predictions that dopamine neurons must have access to in or10 Neuron 91, July 6, 2016
der to compute prediction errors. Work on the circuit structure of the basal ganglia and its role in reward prediction has suggested that striatal inputs to dopamine neurons are partly mediated by the globus pallidus and lateral habenula, with stimulus values (i.e., predictions) computed and transmitted to dopamine neurons from several stages upstream. The lateral habenula has received particular attention in the transmission of reward signals to dopamine neurons (Matsumoto and Hikosaka, 2007), as it exerts an inhibitory input to the dopaminergic cells in SN/ VTA via the rostromedial tegmental nucleus. However, in these experiments, reward magnitude was not manipulated separately from reward timing, and the exact nature of information transmitted to midbrain dopamine neurons via the GPi and LHb pathway remains to be elucidated. Second, the data demonstrate that the VS does not appear to supply unitary information regarding the timing and quantity of upcoming reward as widely believed, but only one source of information that relates to temporal specificity. This is interesting given the data’s inconsistency with the elegant and parsimonious method of combining reward size and timing predictions into a single feature by the temporal difference learning model. Specifically, temporal predictions about reward were not removed in VS-lesioned rats (which would cause positive prediction errors to be encoded whenever reward was delivered), but rather dopamine neurons fired as if the rats expected a reward at all times during a trial. Thus, when reward timing was changed, the dopamine neurons responded neither with the typical brief pause to reward omission nor with a burst to its appearance a short time later (Figure 1B). Interestingly, and in line with a general role of the VS in timing, deficits in the specificity of reward timing predictions in VS-lesioned rats have been previously demonstrated behaviorally in Pavlovian conditioning, with rats behaving similarly throughout long-duration stimuli that predict reward (Singh et al., 2011). In contrast, control rats showed increased responding as the reward period approached, consistent with the idea that predictions about reward timing remained intact.
There were no major behavioral differences between the unilateral VS-lesioned and control rats during the task. One possibility for this finding is that because only ipsilateral VS lesions were made in the animals, the contralateral VS provided the timing information necessary for normal behavior (evidence for this is provided by the fact that rats with bilateral lesions show significant behavioral deficits for delayed reward blocks; Burton et al., 2014). However, another possibility is that the temporal precision of dopamine neurons is not very high (Fiorillo et al., 2008) and the delays used in the task were insufficient to produce behavioral effects. Another related point is that VS neurons have consistently been found to encode information relating to the size and number of expected rewards (Roesch et al., 2009), yet the lesions in the present study only affected timing information. The selectivity of the lesions in affecting temporal predictions suggests that dopamine neurons have access to an alternative source of information regarding reward size, with one possibility being prefrontal regions (Jo and Mizumori, 2015). In any case, to further substantiate the finding that the input to dopamine neurons is not a unitary temporal/size prediction, a double dissociation of timing and size reward prediction errors will help. Finally, temporally precise interference, specifically with striato-dopamine projections and presumed striatal state representations, together with concurrent recording from dopamine neurons, will further unravel the exact nature of temporal representations passed on from the VS to dopamine neurons. In any case, by showing that reward time matters more than reward size for striatum-dopamine projections, the paper of Takahashi and colleagues has opened a new perspective for the field and paved the way for exciting further studies. ACKNOWLEDGMENTS This work was supported in part by Swiss NSF grants PP00P1_150739 and 00014_165884. REFERENCES Barto, A.G. (1995). In Models of Information Processing in the Basal Ganglia, J.C. Houk, J.L. Davis, and D.G. Beiser, eds. (MIT Press), pp. 215–232.
Neuron
Previews Burton, A.C., Bissonette, G.B., Lichtenberg, N.T., Kashtelyan, V., and Roesch, M.R. (2014). Biol. Psychiatry 75, 132–139. Contreras-Vidal, J.L., and Schultz, W. (1999). J. Comput. Neurosci. 6, 191–214. Daw, N.D., Courville, A.C., and Touretzky, D.S. (2006). Neural Comput. 18, 1637–1677. Fiorillo, C.D., Newsome, W.T., and Schultz, W. (2008). Nat. Neurosci. 11, 966–973. Houk, J.C., Adams, J.L., and Barto, A.G. (1995). In Models of Information Processing in the Basal
Ganglia, J.C. Houk, J.L. Davis, and D.G. Beiser, eds. (MIT Press), pp. 249–270. Jo, Y.S., and Mizumori, S.J. (2015). Cereb. Cortex., Published online September 22, 2015. http://dx. doi.org/10.1093/cercor/bhv215. Joel, D., Niv, Y., and Ruppin, E. (2002). Neural Netw. 15, 535–547.
Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E., and Schoenbaum, G. (2009). J. Neurosci. 29, 13365–13376. Schultz, W., Dayan, P., and Montague, P.R. (1997). Science 275, 1593–1599. Singh, T., McDannald, M.A., Takahashi, Y.K., Haney, R.Z., Cooch, N.K., Lucantonio, F., and Schoenbaum, G. (2011). Learn. Mem. 18, 85–87.
Matsumoto, M., and Hikosaka, O. (2007). Nature 447, 1111–1115.
Sutton, R.S. (1988). Mach. Learn. 3, 9–44.
Oyama, K., Tateyama, Y., Herna´di, I., Tobler, P.N., Iijima, T., and Tsutsui, K. (2015). J. Neurophysiol. 114, 2600–2615.
Takahashi, Y.K., Langdon, A.J., Niv, Y., and Schoenbaum, G. (2016). Neuron 91, this issue, 182–193.
Neuron 91, July 6, 2016 11