Associate Professor, Department of Neuroscience, Johns Hopkins University
Abstract: Decisions take place in dynamic environments. The nervous system must continually learn the best actions to obtain rewards. In the theoretical framework of optimal control and reinforcement learning, behavioral policies are updated by feedback arising from errors in the predicted reward. These reward prediction errors have been mapped to dopamine neurons in the midbrain, but it is unclear how the decision variables that generate policies themselves are represented and modulated. We trained mice on a dynamic foraging task, in which they freely chose between two alternatives that delivered reward with changing probabilities. We found that corticostriatal neurons, in the medial prefrontal cortex (mPFC), maintained persistent changes in firing rates that represented relative and total action values over long timescales. These are consistent with control signals used to drive flexible behavior. We next recorded from serotonin neurons in the dorsal raphe, to test the hypothesis that their signals could be used to modulate dynamic learning. We found that serotonin neurons represented a quantity related to reward uncertainty over long timescales (tens of seconds), consistent with a modulatory signal used to adjust learning of ongoing decision variables. Our results provide a quantitative link between serotonin neuron activity and behavior.