Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward

Edited by Marcus E. Raichle, Washington University in St. Louis, St. Louis, MO, and approved October 23, 2015 (received for review July 13, 2015)
November 23, 2015
113 (1) 200-205
Commentary
Dopamine: Context and counterfactuals
Michael L. Platt, John M. Pearson

Significance

There is an abundance of circumstantial evidence (primarily work in nonhuman animal models) suggesting that dopamine transients serve as experience-dependent learning signals. This report establishes, to our knowledge, the first direct demonstration that subsecond fluctuations in dopamine concentration in the human striatum combine two distinct prediction error signals: (i) an experience-dependent reward prediction error term and (ii) a counterfactual prediction error term. These data are surprising because there is no prior evidence that fluctuations in dopamine should superpose actual and counterfactual information in humans. The observed compositional encoding of “actual” and “possible” is consistent with how one should “feel” and may be one example of how the human brain translates computations over experience to embodied states of subjective feeling.

Abstract

In the mammalian brain, dopamine is a critical neuromodulator whose actions underlie learning, decision-making, and behavioral control. Degeneration of dopamine neurons causes Parkinson’s disease, whereas dysregulation of dopamine signaling is believed to contribute to psychiatric conditions such as schizophrenia, addiction, and depression. Experiments in animal models suggest the hypothesis that dopamine release in human striatum encodes reward prediction errors (RPEs) (the difference between actual and expected outcomes) during ongoing decision-making. Blood oxygen level-dependent (BOLD) imaging experiments in humans support the idea that RPEs are tracked in the striatum; however, BOLD measurements cannot be used to infer the action of any one specific neurotransmitter. We monitored dopamine levels with subsecond temporal resolution in humans (n = 17) with Parkinson’s disease while they executed a sequential decision-making task. Participants placed bets and experienced monetary gains or losses. Dopamine fluctuations in the striatum fail to encode RPEs, as anticipated by a large body of work in model organisms. Instead, subsecond dopamine fluctuations encode an integration of RPEs with counterfactual prediction errors, the latter defined by how much better or worse the experienced outcome could have been. How dopamine fluctuations combine the actual and counterfactual is unknown. One possibility is that this process is the normal behavior of reward processing dopamine neurons, which previously had not been tested by experiments in animal models. Alternatively, this superposition of error terms may result from an additional yet-to-be-identified subclass of dopamine neurons.

Continue Reading

Acknowledgments

The authors thank the patient volunteers and the research and surgical nursing staff at Wake Forest University Health Sciences Center for invaluable support and cooperation. In particular, the authors thank Wendy Jenkins, Valerie Hughes, and Patti Pepper for coordinating the patients and clinical and research staff in support of the research efforts reported here. The authors thank Nathan Apple for help in digitizing artwork displayed in Fig. 1. The authors also thank Peter Dayan, Sam McClure, Rosalyn Moran, Cathy Price, and Alec Solway for reading and commenting on earlier drafts of this manuscript. During the course of this work, prior to publication, T.L.E. died. His contributions were critical and invaluable in the early stages of this project, including planning the execution of these experiments during surgery and evaluating the safety and applicability of the reported work. T.L.E. recognized the potential of the technology to be developed and the questions to be asked and dedicated significant time and effort leading his staff and collaborators to accomplish this work. This work was funded by the Wellcome Trust (P.R.M.), the Kane Family Foundation (P.R.M.), and Virginia Tech (P.R.M.).

Supporting Information

Supporting Information (PDF)
Supporting Information

References

1
PR Montague, SE Hyman, JD Cohen, Computational roles for dopamine in behavioural control. Nature 431, 760–767 (2004).
2
RA Wise, Dopamine, learning and motivation. Nat Rev Neurosci 5, 483–494 (2004).
3
J Lotharius, P Brundin, Pathogenesis of Parkinson’s disease: Dopamine, vesicles and α-synuclein. Nat Rev Neurosci 3, 932–942 (2002).
4
DJ Moore, AB West, VL Dawson, TM Dawson, Molecular pathophysiology of Parkinson’s disease. Annu Rev Neurosci 28, 57–87 (2005).
5
JD Cohen, D Servan-Schreiber, Context, cortex, and dopamine: A connectionist approach to behavior and biology in schizophrenia. Psychol Rev 99, 45–77 (1992).
6
SE Hyman, RC Malenka, Addiction and the brain: The neurobiology of compulsion and its persistence. Nat Rev Neurosci 2, 695–703 (2001).
7
EJ Nestler, Jr WA Carlezon, The mesolimbic dopamine reward circuit in depression. Biol Psychiatry 59, 1151–1159 (2006).
8
PR Montague, P Dayan, TJ Sejnowski, A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16, 1936–1947 (1996).
9
W Schultz, P Dayan, PR Montague, A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
10
HM Bayer, PW Glimcher, Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).
11
AS Hart, RB Rutledge, PW Glimcher, PE Phillips, Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J Neurosci 34, 698–704 (2014).
12
ND Daw, K Doya, The computational neurobiology of learning and reward. Curr Opin Neurobiol 16, 199–204 (2006).
13
H Nakahara, H Itoh, R Kawagoe, Y Takikawa, O Hikosaka, Dopamine neurons can represent context-dependent prediction error. Neuron 41, 269–280 (2004).
14
MR Roesch, DJ Calu, G Schoenbaum, Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10, 1615–1624 (2007).
15
CD Fiorillo, PN Tobler, W Schultz, Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
16
M Matsumoto, O Hikosaka, Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).
17
ES Bromberg-Martin, M Matsumoto, O Hikosaka, Dopamine in motivational control: Rewarding, aversive, and alerting. Neuron 68, 815–834 (2010).
18
PR Montague, et al., Dynamic gain control of dopamine delivery in freely moving animals. J Neurosci 24, 1754–1759 (2004).
19
T Lohrenz, K McCabe, CF Camerer, PR Montague, Neural signature of fictive learning signals in a sequential investment task. Proc Natl Acad Sci USA 104, 9493–9498 (2007).
20
PH Chiu, TM Lohrenz, PR Montague, Smokers’ brains compute, but ignore, a fictive error signal in a sequential investment task. Nat Neurosci 11, 514–520 (2008).
21
KT Kishida, et al., Sub-second dopamine detection in human striatum. PLoS One 6, e23291 (2011).
22
P Limousin, et al., Effect of parkinsonian signs and symptoms of bilateral subthalamic nucleus stimulation. Lancet 345, 91–95 (1995).
23
P Limousin, et al., Electrical stimulation of the subthalamic nucleus in advanced Parkinson’s disease. N Engl J Med 339, 1105–1111 (1998).
24
RB Keithley, RM Wightman, Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry. ACS Chem Neurosci 2, 514–525 (2011).
25
JR Hollerman, W Schultz, Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1, 304–309 (1998).
26
DE Bell, Regret in decision making under uncertainty. Oper Res 30, 961–981 (1982).
27
G Loomes, R Sugden, Regret theory: An alternative theory of rational choice under uncertainty. Econ J 92, 805–824 (1982).
28
D Weintraub, Dopamine and impulse control disorders in Parkinson’s disease. Ann Neurol 64, S93–S100 (2008).
29
K Witt, et al., Neuropsychological and psychiatric changes after deep brain stimulation for Parkinson’s disease: A randomised, multicentre study. Lancet Neurol 7, 605–614 (2008).
30
V Voon, et al., Chronic dopaminergic stimulation in Parkinson’s disease: From dyskinesias to impulse control disorders. Lancet Neurol 8, 1140–1149 (2009).
31
R Arai, N Karasawa, M Geffard, I Nagatsu, L-DOPA is converted to dopamine in serotonergic fibers of the striatum of the rat: A double-labeling immunofluorescence study. Neurosci Lett 195, 195–198 (1995).
32
M Carta, T Carlsson, D Kirik, A Björklund, Dopamine released from 5-HT terminals is the cause of L-DOPA-induced dyskinesia in parkinsonian rats. Brain 130, 1819–1833 (2007).
33
BY Hayden, JM Pearson, ML Platt, Fictive reward signals in the anterior cingulate cortex. Science 324, 948–950 (2009).
34
H Abe, D Lee, Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70, 731–741 (2011).
35
AP Steiner, AD Redish, Behavioral and neurophysiological correlates of regret in rat decision-making on a neuroeconomic task. Nat Neurosci 17, 995–1002 (2014).
36
N Camille, et al., The involvement of the orbitofrontal cortex in the experience of regret. Science 304, 1167–1170 (2004).
37
G Coricelli, et al., Regret and its avoidance: A neuroimaging study of choice behavior. Nat Neurosci 8, 1255–1262 (2005).
38
M Tobia, et al., Neural systems for choice and valuation with counterfactual learning signals. Neuroimage 89, 57–69 (2014).
39
NK Logothetis, BA Wandell, Interpreting the BOLD signal. Annu Rev Physiol 66, 735–769 (2004).
40
NK Logothetis, J Pauls, M Augath, T Trinath, A Oeltermann, Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150–157 (2001).
41
KA Zaghloul, et al., Human substantia nigra neurons encode unexpected financial rewards. Science 323, 1496–1499 (2009).
42
CJ Watkins, P Dayan, Q-learning. Mach Learn 8, 279–292 (1992).
43
CJCH Watkins, Learning from delayed rewards, PhD dissertation (University of Cambridge, Cambridge, UK). (1989).
44
AG Barto, MT Rosenstein, Chapter 14: Supervised actor-critic reinforcement learning. Handbook of Learning and Approximate Dynamic Programming, eds J Si, AG Barto, WB Powell, D Wunsch (Wiley-IEEE Press, Piscataway, NJ), pp. 359–380 (2004).
45
PR Montague, B King-Casas, JD Cohen, Imaging valuation models in human choice. Annu Rev Neurosci 29, 417–448 (2006).
46
W Schultz, Multiple dopamine functions at different time courses. Annu Rev Neurosci 30, 259–288 (2007).
47
PE Phillips, GD Stuber, ML Heien, RM Wightman, RM Carelli, Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003).
48
JJ Clark, et al., Chronic microsensors for longitudinal, subsecond dopamine detection in behaving animals. Nat Methods 7, 126–129 (2010).
49
H Zou, T Hastie, Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67, 301–320 (2005).
50
J Qian, T Hastie, J Friedman, R Tibshirani, N Simon, Glmnet for matlab. Available at: www.stanford.edu/∼hastie/glmnet_matlab. (2013).
51
PF D’Haese, et al., CranialVault and its CRAVE tools: A clinical computer assistance system for deep brain stimulation (DBS) therapy. Med Image Anal 16, 744–753 (2012).
52
DL Robinson, BJ Venton, ML Heien, RM Wightman, Detecting subsecond dopamine release with fast-scan cyclic voltammetry in vivo. Clin Chem 49, 1763–1773 (2003).
53
Jr JH Ward, Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58, 236–244 (1963).
54
AE Hoerl, RW Kennard, Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
55
R Tibshirani, Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58, 267–288 (1996).
56
ER Malinowski, Statistical f‐tests for abstract factor analysis and target testing. J Chem 3, 49–60 (1989).
57
JE Jackson, GS Mudholkar, Control procedures for residuals associated with principal component analysis. Technometrics 21, 341–349 (1979).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 113 | No. 1
January 5, 2016
PubMed: 26598677

Classifications

Submission history

Published online: November 23, 2015
Published in issue: January 5, 2016

Keywords

  1. dopamine
  2. reward prediction error
  3. counterfactual prediction error
  4. decision-making
  5. human fast-scan cyclic voltammetry

Acknowledgments

The authors thank the patient volunteers and the research and surgical nursing staff at Wake Forest University Health Sciences Center for invaluable support and cooperation. In particular, the authors thank Wendy Jenkins, Valerie Hughes, and Patti Pepper for coordinating the patients and clinical and research staff in support of the research efforts reported here. The authors thank Nathan Apple for help in digitizing artwork displayed in Fig. 1. The authors also thank Peter Dayan, Sam McClure, Rosalyn Moran, Cathy Price, and Alec Solway for reading and commenting on earlier drafts of this manuscript. During the course of this work, prior to publication, T.L.E. died. His contributions were critical and invaluable in the early stages of this project, including planning the execution of these experiments during surgery and evaluating the safety and applicability of the reported work. T.L.E. recognized the potential of the technology to be developed and the questions to be asked and dedicated significant time and effort leading his staff and collaborators to accomplish this work. This work was funded by the Wellcome Trust (P.R.M.), the Kane Family Foundation (P.R.M.), and Virginia Tech (P.R.M.).

Notes

This article is a PNAS Direct Submission.
See Commentary on page 22.

Authors

Affiliations

Kenneth T. Kishida1 [email protected]
Virginia Tech Carilion Research Institute, Virginia Tech, Roanoke, VA 24016;
Ignacio Saez
Virginia Tech Carilion Research Institute, Virginia Tech, Roanoke, VA 24016;
Present address: Helen Wills Neuroscience Institute and Haas School of Business, University of California, Berkeley, CA 94720.
Terry Lohrenz
Virginia Tech Carilion Research Institute, Virginia Tech, Roanoke, VA 24016;
Mark R. Witcher
Department of Neurosurgery, Wake Forest Health Sciences, Winston-Salem, NC 27157;
Adrian W. Laxton
Department of Neurosurgery, Wake Forest Health Sciences, Winston-Salem, NC 27157;
Stephen B. Tatter
Department of Neurosurgery, Wake Forest Health Sciences, Winston-Salem, NC 27157;
Jason P. White
Virginia Tech Carilion Research Institute, Virginia Tech, Roanoke, VA 24016;
Thomas L. Ellis
Department of Neurosurgery, Wake Forest Health Sciences, Winston-Salem, NC 27157;
Deceased June 30, 2012.
Paul E. M. Phillips
Department of Psychiatry & Behavioral Sciences, University of Washington, Seattle, WA 98195;
Department of Pharmacology, University of Washington, Seattle, WA 98195;
P. Read Montague1 [email protected]
Virginia Tech Carilion Research Institute, Virginia Tech, Roanoke, VA 24016;
Department of Physics, Virginia Tech, Blacksburg, VA 24060;
Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom

Notes

1
To whom correspondence may be addressed. Email: [email protected] or [email protected].
Author contributions: K.T.K., T.L., T.L.E., P.E.M.P., and P.R.M. designed research; P.R.M. guided all aspects of this work, including conception of the adaptation of prior rodent microsensor technology for use in humans; T.L. and P.R.M. designed the sequential choice task; M.R.W., A.W.L., S.B.T., and T.L.E. conceived of surgical strategies for safe and effective placement of microsensors for human fast-scan cyclic voltammetry (FSCV) experiments; P.E.M.P. guided microsensor fabrication; I.S. assisted with optimization of microsensor design and engineering of mobile electrochemistry unit; K.T.K., I.S., M.R.W., A.W.L., and S.B.T. performed research; M.R.W., A.W.L., and S.B.T. performed surgical placement of probes; P.E.M.P. guided FSCV experiments; K.T.K. executed FSCV experiments (in vivo and in vitro); I.S. assisted with FSCV experiments (in vivo and in vitro); K.T.K., I.S., J.P.W., and P.E.M.P. contributed new reagents/analytic tools; K.T.K. built and optimized parameters for the extended carbon-fiber microsensors and engineered the integration of mobile electrochemistry unit with game play technology; P.R.M. guided and interpreted signal extraction development and optimization procedures; K.T.K. optimized the signal extraction algorithm using the elastic net; J.P.W. performed temporal alignment of signals collected on electrochemistry unit and integrated game play system (NEMO); K.T.K., I.S., T.L., J.P.W., and P.R.M. analyzed data; P.R.M. guided all analyses; P.R.M. guided and interpreted results from FSCV experiments; K.T.K., I.S., T.L., M.R.W., A.W.L., S.B.T., J.P.W., P.E.M.P., and P.R.M. interpreted results; and K.T.K., T.L., and P.R.M. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements

Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward
    Proceedings of the National Academy of Sciences
    • Vol. 113
    • No. 1
    • pp. 1-E104

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media