Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making

Edited by Ranulfo Romo, Universidad Nacional Autonóma de México, Mexico City, D.F., Mexico, and approved December 23, 2014 (received for review September 11, 2014)
January 20, 2015
112 (5) 1595-1600

Significance

Whether humans make choices based on a deliberative “model-based” or a reflexive “model-free” system of behavioral control remains an ongoing topic of research. Dopamine is implicated in motivational drive as well as in planning future actions. Here, we demonstrate that higher presynaptic dopamine in human ventral striatum is associated with more pronounced model-based behavioral control, as well as an enhanced coding of model-based signatures in lateral prefrontal cortex and diminished coding of model-free learning signals in ventral striatum. Our study links ventral striatal presynaptic dopamine to a balance between two distinct modes of behavioral control in humans. The findings have implications for neuropsychiatric diseases associated with alterations of dopamine neurotransmission and a disrupted balance of behavioral control.

Abstract

Dual system theories suggest that behavioral control is parsed between a deliberative “model-based” and a more reflexive “model-free” system. A balance of control exerted by these systems is thought to be related to dopamine neurotransmission. However, in the absence of direct measures of human dopamine, it remains unknown whether this reflects a quantitative relation with dopamine either in the striatum or other brain areas. Using a sequential decision task performed during functional magnetic resonance imaging, combined with striatal measures of dopamine using [18F]DOPA positron emission tomography, we show that higher presynaptic ventral striatal dopamine levels were associated with a behavioral bias toward more model-based control. Higher presynaptic dopamine in ventral striatum was associated with greater coding of model-based signatures in lateral prefrontal cortex and diminished coding of model-free prediction errors in ventral striatum. Thus, interindividual variability in ventral striatal presynaptic dopamine reflects a balance in the behavioral expression and the neural signatures of model-free and model-based control. Our data provide a novel perspective on how alterations in presynaptic dopamine levels might be accompanied by a disruption of behavioral control as observed in aging or neuropsychiatric diseases such as schizophrenia and addiction.
Human choice behavior is influenced by both habitual and goal-directed systems (1). For example, having enjoyed a delicious dinner makes another subsequent visit to the same restaurant more likely. Upon returning at a later point, another visit could happen reflexively when walking past the restaurant, or alternatively be planned and involve reflection, for instance, by checking recent customer reviews to bolster against possible changes. These two decision modes differ fundamentally in terms of their control over actions and associated outcome consequences. Reflexive habitual preferences are retrospective and arise from a slow accumulation of rewards via iterative updating of expectations (2), for example by repeating dinner at the same place after having previously enjoyed tasty food there. In contrast, goal-directed behavior requires a prospective consideration of future outcomes associated with a set of actions (3). For example, knowledge that the chef has changed and subsequent reviews have been less good should reduce one’s expectations. Thus, in the face of such change, a goal-directed system can adapt quickly, whereas a habitual system needs to experience an actual outcome before it can alter behavior in an adaptive manner (4). This dual-system theory has been formalized within computational models of learning that update expectations based on past rewards (“model-free”) or map possible actions to their potential outcomes (“model-based”) (5). There is evidence that model-based learning signals during the acquisition of task structure are encoded within prefrontal–parietal cortices, whereas model-free learning signals are encoded in ventral striatum (6). In the sequential decision task used here, a neural dissociation between the two systems has been less easy to define, with prefrontal cortex (PFC) and ventral striatum coding both model-free learning signals and additional model-based signatures (7).
An unresolved question centers on what factors relate to the degree to which an individual’s choices reflect the dominance of either model-free or model-based systems of control. Among neuromodulators, dopamine has repeatedly been linked to this balance (1, 812), although it is important to acknowledge that other neuromodulatory agents are likely to also play a role (13). Traditionally, dopamine is associated with model-free learning, representing a teaching signal used to update expectations, for example via a temporal difference reward prediction error (14, 15). Potential correlates of this dopamine learning signal have been reported in functional magnetic resonance imaging (fMRI) studies in humans (e.g., ref. 16). On the other hand, individual variation of striatal presynaptic dopamine, quantified using neurochemical imaging, is known to positively relate to variability in “prefrontal” cognitive capacities (17, 18), which might also limit the capacity for model-based learning (19). Indeed, depletion of presynaptic dopamine precursors and Parkinson’s disease both compromised goal-directed behavior in a devaluation experiment and a slips-of-action test, whereas habitual learning remained intact (20, 21). Furthermore, a pharmacological challenge with l-DOPA, a manipulation known to boost overall brain dopamine levels, has been shown to enhance model-based over model-free choices in a sequential decision-making task (12). These studies raise the possibility that a balance between model-free and model-based control is intimately related to variations in dopamine levels but they are agnostic as to the likely locus of this influence.
A radiolabeled variant of l-DOPA, [18F]DOPA, allows quantification of individual levels of presynaptic dopamine in vivo by using positron emission tomography (PET) (22). Schlagenhauf et al. (23) used this methodology to show an inverse relationship between ventral striatal presynaptic dopamine levels and an fMRI signal that indexed ventral striatal model-free learning signals. Ventral striatal presynaptic dopamine levels are a candidate marker for a balance between model-free and model-based control in light of evidence that ventral striatal lesions impair model-based learning (24), whereas ventral striatal activation encodes a signature of both model-free and model-based learning (7). Furthermore, as mentioned above, presynaptic dopamine levels in ventral striatum were negatively correlated with ventral striatal model-free learning signals (23).
Here, we combine a two-step sequential decision task during fMRI with [18F]DOPA PET to quantify interindividual differences in striatal presynaptic dopamine levels. Our hypothesis was that interindividual variation in presynaptic levels of striatal dopamine relate to behavioral and neural signatures of model-based and model-free control.

Results

Model-Free Versus Model-Based Control.

A balance between model-free and model-based choice behavior was assessed using a two-step decision task in 29 healthy participants (Fig. 1 A and B). In this task, subjects make two sequential choices between stimulus pairs to receive a monetary reward. At the first stage, each choice option led commonly (70% probability) to one of two pairs of stimuli and rarely (30% probability) to the other pair. After entering the second stage, a second choice was followed by monetary reward or zero outcome, delivered according to slowly changing Gaussian random walks to facilitate continuous updating of action values. A purely model-based learner exploits probabilities in the transition structure from the first to the second stage, whereas a purely model-free learner neglects this task structure. It has been shown that behavior shows influences of both systems (7) (Fig. S1) and at an individual level a balance between model-free and model-based control can be quantified by a hybrid model. This hybrid model combines the decision values of two algorithms according to a weighting factor ω. One algorithm involves model-free temporal difference learning, whereas the other performs a model-based tree search by using explicitly instructed transition probabilities to prospectively update first-stage decision values (SI Text). A higher weighting parameter ω indicates a bias toward model-based choices and is our primary measure of interest. The models were implemented as in the original paper (7), and in line with previous studies (7, 12), a hybrid model again best explained choice behavior as shown in a Bayesian model selection procedure (exceedance probability = 0.98; Table S1; ref. 25).
Fig. 1.
Behavioral task and relation to presynaptic dopamine. (A) Exemplary trial sequence of the two-step decision task and timing. (B) Illustration of the state transition matrix. (C) Mean voxelwise Ki map of 29 participants and borders of striatal regions of interest. (D) Correlation between right ventral striatal Ki and the balance of model-free and model-based choices ω (r = 0.31; P = 0.04) and between right ventral striatal Ki and the reaction times for common versus rare states (r = 0.38; P = 0.04).

Striatal Dopamine and a Balance of Behavioral Control.

To test whether striatal presynaptic dopamine levels relate to a balance between model-free and model-based choice behavior, we used the weighting parameter ω derived from computational modeling (Table S2) as dependent variable in a linear regression analysis with a quantitative metric of F-DOPA uptake (Ki) from right and left ventral and remaining striatum as independent variables (Fig. 1C). This revealed a significant positive relation between Ki in right ventral striatum and the parameter ω (ventral striatum—right: β = 0.43, t = 2.16, P = 0.04; left: β = 0.10, t = 0.40, P = 0.70; remaining striatum—right: β = 0.10, t = 0.34, P = 0.73; left: β = −0.46, t = 1.48, P = 0.15; Fig. 1D). We repeated this linear regression analysis with presynaptic dopamine from ventral striatum, caudate, and putamen for each hemisphere. As in the initial regression analysis, this revealed that right ventral striatal presynaptic dopamine alone related to the weighting parameter ω (ventral striatum—right: β = 0.46, t = 2.22, P = 0.04; left: β = 0.07, t = 0.33, P = 0.74; caudate—right: β = −0.04, t = 0.14, P = 0.89; left: β = −0.03, t = 0.10, P = 0.92; putamen—right: β = 0.09, t = 0.33, P = 0.74; left: β = −0.46, t = 1.68, P = 0.11). This positive relationship was also consistent with findings from an analysis of stay–switch behavior at the first stage as a function of right ventral striatal presynaptic dopamine (SI Text, Fig. S2). In line with our hypothesis, ventral striatal presynaptic dopamine levels were associated with a behavioral bias toward model-based choices.
Our finding of a positive relation between ventral striatal presynaptic dopamine and model-based control indicates that a model-based system is more engaged as a function of higher ventral striatal presynaptic dopamine. This relationship can also be probed via an analysis of second-stage reaction times. In our task, a model-based learner uses knowledge about state transitions and second-stage reaction time differences between common versus rare states should reflect the level of involvement in model-based control. When comparing common and rare states, we found that second-stage reaction times differed significantly (paired t test: mean difference, 218 ± 165 ms SD; t = 7.10; P < 0.001; Fig. S3). Note that model-free learning cannot account for this effect because it neglects the state transition matrix. Reaction times were significantly slower in rare compared with common states and individual variability in this reaction time difference (most likely slowing down in rare states; Fig. S3) positively related to the parameter ω (r = 0.59, P = 0.001; Fig. S4), where the latter was inferred independently of reaction times using computational modeling. Crucially, a positive relation between the second-stage reaction time difference for rare versus common states was linked to right ventral striatal presynaptic dopamine (linear regression analysis: ventral striatum—right: β = 0.47, t = 2.33, P = 0.03; left: β = 0.03, t = 0.14, P = 0.89; remaining striatum—right: β = 0.07, t = 0.22, P = 0.83; left: β = −0.32, t = −1.02, P = 0.32; Fig. 1D). The latter relationship was specific for the second-stage reaction time difference comparing common with rare states, whereas no relationship was evident between presynaptic dopamine levels in ventral striatum and overall reaction times at the second stage of the task (Fig. S4). This analysis further supports the idea that higher levels of ventral striatal presynaptic dopamine relate to more pronounced model-based control in rare task states, where the computational cost of model-based inference is expected to result in slower reaction times.

Neural Signatures of Model-Free and Model-Based Choices.

We first replicated the results reported by Daw et al. (7), who showed that ventral striatal blood oxygen level-dependent (BOLD) signals reflect model-free as well as model-based components. Following the same analytic strategy, we first sought to identify brain regions where BOLD responses covaried with model-free prediction errors. We then asked whether these BOLD signals might also incrementally reflect model-based components, by including the difference between model-based and model-free prediction errors as an additional regressor (for details, see Experimental Procedures). Positive correlations with model-free prediction errors were observed in a prefrontal-striatal network, including sectors of lateral and medial PFC bilaterally as well as bilateral ventral striatum [P < 0.05, familywise error (FWE)-corrected at the peak level for the whole brain; Fig. 2 and Table S3]. The effect of additional model-based components reached significance in the same regions, namely bilateral ventral striatum, right lateral PFC, and medial PFC (P < 0.05, FWE-corrected at the peak level for the respective bilateral regions of interest; Fig. 2 and Table S3). The conjunction of model-free and model-based effects reached significance in right lateral PFC and bilateral ventral striatum (P < 0.05, FWE-corrected at the peak level for the respective bilateral regions of interest; Fig. 2).
Fig. 2.
fMRI results. Model-free prediction errors (Left), additional model-based signals (Middle), and the conjunction of both (Right) in ventral striatum (VS, Upper) and lateral prefrontal cortex (lPFC, Lower). For display purposes, all statistical maps are thresholded at a minimum T value of 3.24 (corresponding to P < 0.001, uncorrected) with a cluster extent k = 20. For details, see Table S3.

Ventral Striatal Dopamine and Ventral Striatal Model-Free Learning Signals.

In previous work (23), we presented evidence for a negative relationship between right ventral striatal presynaptic dopamine levels and model-free prediction errors in right ventral striatum. To replicate this finding, we extracted parameter estimates of model-free prediction errors in right ventral striatum at peak coordinates [x = 16, y = 8, z = −8] from the conjunction contrast within an 8-mm sphere. In an analysis restricted to right ventral striatum based on previous work (23), we again found a negative relationship between ventral striatal coding of model-free prediction errors and ventral striatal presynaptic dopamine levels (r = −0.37; P < 0.05; Fig. 3A). This correlation also remained significant when controlling for presynaptic dopamine levels from other striatal regions (SI Text) and when perfoming a voxelwise analysis (SI Text, Fig. S5).
Fig. 3.
Presynaptic dopamine and neural learning signatures. Correlation between right ventral striatal presynaptic dopamine Ki and (A) model-free learning signals in right ventral striatum (r = −0.37; P = 0.02) and (B) model-based signatures in right lateral prefrontal cortex (r = 0.38; P = 0.03).

Ventral Striatal Dopamine and Neural Model-Based Signatures.

Here, we asked whether right ventral striatal presynaptic dopamine levels related to encoding of model-based information. We extracted parameter estimates of the model-based difference regressor for lateral PFC [x = 42, y = 24, z = −14] and ventral striatum [x = 16, y = 8, z = −8] at peak coordinates of the conjunction contrast (surrounded by 8-mm spheres), which were then subjected to an ANOVA with the factor “region” and right ventral striatal Ki as a covariate. We found a significant region by Ki interaction (F = 5.10; P < 0.05), driven by a significant positive relation between ventral striatal Ki with model-based signatures in lateral PFC (r = 0.39; P < 0.05; Fig. 3B) but not in ventral striatum (r = −0.07; P > 0.7). This correlation also remained significant when controlling for presynaptic dopamine levels from other striatal regions (SI Text) and when perfoming a voxelwise analysis (SI Text, Fig. S5). Note that the sensitivity of the PET technique does not allow accurate measures of cortical levels of presynaptic dopamine.

Discussion

Here, we demonstrate that ventral striatal presynaptic dopamine reflects a balance in the behavioral and neural signatures of model-free and model-based control in a two-stage sequential decision-making task. Higher levels of presynaptic dopamine in right ventral striatum were positively related to a greater disposition to make model-based choices. Crucially, higher levels of presynaptic dopamine in right ventral striatum were also associated with stronger model-based coding in lateral PFC and diminished coding of model-free prediction errors in ventral striatum.

Ventral Striatal Dopamine and a Model-Based System.

It has been shown previously, using an identical task to the one used here, that administration of l-DOPA increases model-based over model-free choices (12). Using PET, we now demonstrate that interindividual differences in ventral striatal presynaptic dopamine levels are related to this bias toward model-based control. This accords with other studies that report enhanced cognitive capacities in subjects with higher levels of striatal F-DOPA uptake (17, 18). Cognitive capacity, particularly as it relates to working memory function, is also linked to the extent to which individuals exploit model-based control (19). Conceptually, this pattern of results can be explained in a framework of uncertainty-based competition between the two decision systems (5). Thus, participants with higher levels of presynaptic dopamine can be thought of as encoding model-based estimates with higher certainty. At a neural level, we demonstrate that ventral striatal presynaptic dopamine levels relate positively to coding of model-based signatures in lateral PFC and are accompanied by a bias toward more model-based choices. It is conceivable that higher levels of presynaptic dopamine enable lateral PFC to code cognitively demanding model-based information with greater precision, thereby increasing certainty in model-based estimates. As a consequence, a model-based system may exert a greater influence on behavioral control. In a similar vein, dopamine is implicated in a modulation of PFC maintenance processes via a gating of cortical gain, rendering coding of relevant environmental information more robust against noise (11, 26, 27). Indeed, the importance of lateral PFC for model-based inference is supported by findings that theta-burst transcranial magnetic stimulation compromises model-based control in humans (28).
Our analysis of second-stage reaction times, which were affected by the state transition matrix, showed that a response time difference for rare versus common states was positively related to a bias toward more model-based choices. Intriguingly, this reaction time difference for rare versus common states positively correlated with ventral striatal presynaptic dopamine. These results are consistent with an engagement of a slower, computationally more costly model-based system (1, 3). Engagement of a model-based system is more likely after rare transitions as these trials are associated with increased uncertainty in representing an anticipated sequence of actions and outcomes. Furthermore, ventral striatal tonic dopamine is implicated in signaling average reward rates (29), a theoretical proposal that has received recent empirical support (e.g., ref. 30). Nevertheless, in the context of the task used here, ventral striatal presynaptic dopamine levels were not related to invigoration per se as represented by overall reaction times. In participants who used a more model-based strategy, one possible explanation is that faster reaction times in common versus rare states reflect higher expectation of average reward rates, resulting in greater invigoration for a specific action–outcome sequence. However, the role of expected average reward rates, invigoration, and model-based learning requires experimental designs tailored to address this question.

Ventral Striatal Dopamine and a Model-Free System.

High levels of ventral striatal presynaptic dopamine can also influence a model-free system as suggested by the inverse correlation with ventral striatal model-free prediction errors, a replication of previous findings (23). This indicates that participants with high levels of ventral striatal presynaptic dopamine show a bias toward a more pronounced model-based form of control and are also characterized by a diminished coding of ventral striatal model-free prediction errors. The hypothesis of uncertainty-based competition (5) might also account for this finding under a premise that higher presynaptic dopamine levels result in larger phasic prediction error dopamine transients. In the reinforcement learning account, this corresponds to an increase in a learning rate within a model-free system. With high model-free learning rates, model-free values change more quickly. Thus, over the course of learning, value changes are more pronounced for single events and a value estimate at a given point in time represents an average across fewer experiences. This could in turn result in greater uncertainty of model-free estimates. Such uncertainty would reduce the weight attached to predictions by a model-free system.
There is substantial evidence that high levels of presynaptic dopamine exert a detrimental effect on NoGo-learning from negative prediction errors and promote Go-learning from positive prediction errors (31). Interestingly, in a previous study (12) as well as in our data, an alternative model with separate learning rates for positive and negative updating provided an inferior fit to the observed choices during the sequential decision task (SI Text) and failed to account for the observed enhancing effect of l-DOPA on model-based behavior in the previous study (12). However, we had only Go-trials and future studies with paradigms designed to disentangle a potential role of Go- and NoGo-learning and learning from positive and negative prediction errors in model-free and model-based control are required.

Ventral Striatal Dopamine and a Balance of the Two Systems.

Ventral striatal presynaptic dopamine may exert its influence on a balance between the two systems by directly affecting an arbitrator that chooses between the two. Here, it is important to note that model-based signals modulated by ventral striatal presynaptic dopamine levels were located to the inferior part of the lateral PFC. Activation at close coordinates has recently been reported to covary with the reliability of estimates arising from the two decision systems as inferred from a hierarchical computational model (32). The latter finding links the inferior section of lateral PFC to an arbitration process. We note that the study by Lee et al. (32) extends the idea of uncertainty-based competition by identifying two PFC regions, the inferior lateral PFC and the frontopolar cortex, involved in the arbitration of the two systems by weighting the reliability of the predictions from each system. With respect to the present study, this also underlines the importance of the association of model-based signatures in inferior lateral PFC with ventral striatal presynaptic dopamine levels hinting at the possibility that these dopamine levels may be directly involved in the arbitration process. State prediction errors for implicit transition learning were expressed in parietal and dorsolateral PFC (6, 32). Future studies should study locally distinct learning signals in lateral PFC (32) and their hierarchical organization as suggested by models of lateral PFC function (33, 34).

Mechanistic Considerations.

With regard to mechanisms, it is important to take into account the intricacies of dopamine neurotransmission. In animal research, learning new reward contingencies is causally linked to time-locked, phasic activation of dopamine neurons (35). We acknowledge that neither fMRI learning signals nor F-DOPA update kinetics can match the dynamical properties of these directly recorded signals. However, phasic dopamine release in ventral striatum selectively facilitates context-dependent inputs to ventral striatal neurons via activation of D1 receptors (36). This ventral striatal activation removes inhibition of midbrain dopamine neurons resulting in an increase in firing of dopamine neurons leading to an enhanced tonic dopamine influence on ventral striatum (36), potentially indexed by activity of dopa decarboxylase. Thus, larger phasic dopamine transients, which happen in response to unexpected events, may reduce the weight attached to a model-free system and allow model-based inputs to dominate. This could in turn be reflected in overall higher presynaptic dopaminergic activity. Such changes have been demonstrated in animal research (36), and it is conceivable that a long-term dominance of such activity might be reflected in higher presynaptic dopamine levels, as assessed here via F-DOPA PET. Although speculative, this notion is supported by evidence for reliability of F-DOPA uptake quantifications in healthy individuals over a period of 1 y (37). Thus, relatively higher presynaptic dopamine levels could preferentially facilitate signals, which are thought to carry important, context-dependent, model-based information (36). A possible neural architecture for these signals includes the hippocampus and prefrontal cortex (38). In the present study, we did not observe model-based signatures in the hippocampus, which may well be due to the applied analytic strategy and the task design (3), but show that interindividual variability in ventral striatal presynaptic dopamine levels coincide with a greater coding of model-based information in lateral PFC. This finding also resonates with the notion of disrupted presynaptic dopamine function in neurological and psychiatric illnesses (e.g., refs. 39 and 40).
Regarding the neural instantiation of both control systems, animal research has highlighted a dissociation between dorsolateral and dorsomedial striatum, with dorsolateral lesions disrupting habit formation, whereas dorsomedial lesions impact on goal-directed control (41, 42). In the present study, we did not observe a relationship between striatal presynaptic dopamine in either caudate nucleus (the homolog of dorsomedial striatum) or putamen (the homolog of dorsolateral striatum) and model-based fMRI effects (SI Text). This may be due to several factors including the choice of experimental task, the type of neural measurement, and also limited homologies between neuroanatomical structures in rodents and primates (43, 44). Furthermore, evidence indicates these structures may encode model-based and model-free value signals (45), quantities that were not assessed here. However, these issues and inconsistencies require clarification in future translational research.

Limitations

The correlative design we deploy precludes any conclusions about causality. This is important when considering factors that may determine individual variability in presynaptic dopamine levels in the healthy population. Here, the orchestration of dopamine and other neuromodulators at a system level should be taken into account. For example, serotonin interferes with aversive processing (46) and learning from negative prediction errors (47), whereas cholinergic influences are linked to an encoding of precision-weighted prediction errors (48). These processes undoubtedly contribute to behavioral control and underline a requirement for a more unified view (49). However, the association between a balance of behavioral control and ventral striatal presynaptic dopamine levels, as demonstrated in the present study, supports the idea that ventral striatum is an important nexus where several inputs converge (50). It remains an open question as to whether the association between ventral striatal presynaptic dopamine and a relative dominance of model-based control in our sequential decision task generalizes to other instances of goal-directed learning and cognitive control. Furthermore, the interpretation of lateralization with respect to right ventral presynaptic dopamine measures is challenging, although this lateralization effect replicates a previous fMRI-PET study (23). Lateralization effects have been reported in human PET studies of the dopamine system (e.g., refs. 51 and 52) and also with respect to the association of these dopamine measurements with reward and motivation (53, 54). However, results in the present study were derived from right-handed participants alone, and all reported correlations remained significant when controlling for dopamine measures from right and left striata.

Conclusion

In summary, we show that interindividual differences in human ventral striatal presynaptic dopamine levels reflect a balance in behavioral and neural signatures of model-free and model-based control. Extending pharmacological challenge findings (12), higher ventral striatal presynaptic dopamine levels were correlated with a bias toward more model-based control. Higher presynaptic dopamine levels were associated with stronger coding of model-based information in lateral PFC and diminished coding of model-free prediction errors in ventral striatum. The link between presynaptic dopamine levels and a balance between model-free and model-based behavioral control has implication for aging as well as psychiatric diseases such as schizophrenia or addiction.

Experimental Procedures

Participants.

Twenty-nine right-handed participants (11 females) with a mean age of 28.35 ± 4.95 y (range, 20–39) were included. The research ethics committee of the Charité Universitätmedizin approved the study, and written informed consent was obtained from the participants.

Task.

A two-step decision task was implemented as in previous studies (7, 12). The task consisted of a total of 201 trials with two choice stages within each trial. At each stage, participants had to give a forced choice (maximum decision time, 2 s) between two stimuli presented either on two gray boxes at the first stage or two pairs of differently colored boxes at the second stage (Fig. 1). All stimuli were randomly assigned to the left and right position on the screen. The chosen stimulus was surrounded with a red frame, moved to the top of the screen after completion of the 2-s decision phase and remained there for 1.5 s. Subsequently, participants entered the second stage, and a reward was delivered after a second-stage choice. Reward probabilities of second-stage stimuli were identical to those of Daw et al. (7). Each first-stage choice was associated with one pair of the second-stage stimuli via a fixed transition probability of 70%, which did not change during the experiment. Trials were separated by an exponentially distributed intertrial interval with a mean of 2 s. Before the experiment and similar to Daw et al. (7), participants were explicitly informed that the transition structure would stay constant throughout the task. Additionally, information was provided about the independence of reward probabilities and their dynamic change over the course of the experiment. Participants were instructed to maximize reward, which they received as monetary payout after completion of the task. Before entering the scanner, participants performed a shortened version of the task (55 trials) with different reward probabilities and stimuli.

Computational Modeling.

As in previous studies, we fit a hybrid model to the observed behavioral data (7, 12). This model weights the relative influence of model-free and model-based choice values, which only differ with respect to first-stage values. This weighting, the relative influence of both systems on first-stage values, is expressed via the parameter ω. The special cases of this model refer to ω = 1 or ω = 0 reflecting purely model-based or purely model-free control over first-stage values, respectively. For details on the model itself, fitting, and model selection, see SI Text.

Magnetic Resonance Imaging.

Functional imaging was performed using a 3-tesla Siemens Trio scanner to acquire gradient echo T2*-weighted echo-planar images with BOLD contrast. Covering the whole brain, 40 slices were acquired in oblique orientation at 20° to the anterior commissure–posterior commissure line and in interleaved order with 2.5-mm thickness, 3 × 3-mm2 in-plane voxel resolution, 0.5-mm gap between slices, repetition time of 2.09 s, echo time of 22 ms, and a flip angle α of 90°. Before functional scanning, a field map was collected to account for individual homogeneity differences of the magnetic field. T1-weighted structural images were also acquired.

Analysis of fMRI Data.

fMRI data were analyzed using SPM8 (www.fil.ion.ucl.ac.uk/spm/software/spm8/). For preprocessing of fMRI data, see SI Text. Before statistical analysis, data were high-pass filtered with a cutoff of 128 s. An event-related analysis was applied to the images on two levels using the general linear model approach as implemented in SPM8. As in the original paper by Daw et al. (7), the analysis comprised two time points within each trial when prediction errors arise: at onsets of the second stage and at reward delivery. Prediction errors at second-stage onsets compare values of first- and second-stage stimuli and can therefore be varied with respect to the weighting parameter ω of the hybrid algorithm. Both time points were entered into the first-level model as one regressor, which was parametrically modulated by (i) model-free prediction errors and (ii) by the difference between model-based and model-free prediction errors, which refers to the partial derivative of the value function with respect to ω and reflects the difference between model-based and model-free values. For details of the first-level model, see SI Text. Two contrasts of interest, model-free prediction errors and the difference regressor reflecting additional model-based predictions, were taken to a second-level random-effects model. For correction of multiple comparisons, FWE correction was applied using small-volume correction for bilateral volumes of interest of the ventral striatum (as obtained in the IBASPM atlas as part of the WFU Pick Atlas), lateral PFC (comprising the middle and inferior frontal gyrus as part of Automated Anatomic Labeling Atlas), and medial PFC (comprising the superior medial frontal and medial orbital gyrus as part of Automated Anatomic Labeling Atlas).

Positron Emission Tomography.

Data were acquired using a Philips Gemini TF16 time-of-flight PET/CT scanner in 3D mode. After a low-dose transmission CT scan for attenuation correction, a dynamic 3D “list-mode” emission recording lasting 60 min was started simultaneously with i.v. injection of 200 MBq of F-DOPA as a slow bolus. The emission data were framed into 20 dynamic frames (3 × 20 s, 3 × 1 min, 3 × 2 min, 3 × 3 min, 7 × 5 min, 1 × 6 min) and reconstructed with an isotropic voxel size of 2 mm.

Analysis of PET Data.

PET data were analyzed using SPM8. For preprocessing of PET data, see SI Text. A quantitative measure of dopamine synthesis capacity (Ki) was obtained voxel-by-voxel using the Gjedde–Patlak linear graphical analysis with the cerebellum as reference region (55). Frames recorded between 20 and 60 min of the emission recording were used for linear fit. The time activity curve of the cerebellum (excluding Vermis) was extracted using a mask from the WFU Pick Atlas. Mean Ki values were extracted from the voxelwise maps using the same mask of ventral striatum as for the fMRI analysis and a corresponding mask of remaining striatal parts taken from the same atlas (compare Fig. 1).

Combination of PET and Behavioral Data.

Right and left Ki from ventral and remaining striatum were entered as independent variables into a linear regression analysis with modeling-derived balance of model-free and model-based choice behavior ω as dependent variable.

Combination of PET and fMRI Data.

The main focus of the present study was to examine the relationship between presynaptic dopamine and additional model-based brain signals. Specifically, we aimed to answer the question whether presynaptic dopamine relates to model-based signatures in ventral striatum or PFC. Parameter estimates were extracted at peak coordinates (surrounded with 8-mm spheres) of the conjunction of model-free and model-based effects. First, and based on previous work (23), parameter estimates of right ventral striatal model-free prediction errors were correlated with Ki from right ventral striatum. Second, parameter estimates of additional model-based effects in right ventral striatum and right lateral PFC were entered into a repeated-measures ANOVA with the factor region. Ki from right ventral striatum was entered as a covariate. For multimodal imaging analysis, Ki from right ventral striatum was chosen because it explained individual differences in the weight of model-free and model-based decisions.

Acknowledgments

We thank Anne Pankow, Teresa Katthagen, Yu Fukuda, and Tobias Gleich for assistance during fMRI data acquisition and Stephan Lücke for organization and assistance during FDOPA PET. This study was supported by grants from the German Research Foundation (to F.S. and A.H.) [Deutsche Forschungsgemeinschaft (DFG) SCHL 1969/1-1, DFG SCHL1969/2-1, and DFG HE2597/14-1 as part of DFG FOR 1617]. L.D. and F.S. were supported by the Max Planck Society. Q.J.M.H. (DFG RA1047/2-1) and R. Boehme (GRK 1123/2) received funding from the German Research Foundation. R.J.D. is supported by a Wellcome Trust Senior Investigator Award (098362/Z/12/Z). A.H. received funding from the German Federal Ministry of Education and Research (01GQ0411; 01QG87164; NGFN Plus 01 GS 08152 and 01 GS 08159).

Supporting Information

Supporting Information (PDF)
Supporting Information

References

1
RJ Dolan, P Dayan, Goals and habits in the brain. Neuron 80, 312–325 (2013).
2
AD Dickinson, Action and habits: The development of behavioural autonomy. Philos Trans R Soc Lond B Biol Sci 308, 67–78 (1985).
3
BB Doll, DA Simon, ND Daw, The ubiquity of model-based reinforcement learning. Curr Opin Neurobiol 22, 1075–1081 (2012).
4
BW Balleine, A Dickinson, Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).
5
ND Daw, Y Niv, P Dayan, Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8, 1704–1711 (2005).
6
J Gläscher, N Daw, P Dayan, JP O’Doherty, States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
7
ND Daw, SJ Gershman, B Seymour, P Dayan, RJ Dolan, Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
8
R Cools, Dopaminergic control of the striatum for high-level cognition. Curr Opin Neurobiol 21, 402–407 (2011).
9
N Hiroyuki, Multiplexing signals in reinforcement learning with internal models and dopamine. Curr Opin Neurobiol 25, 123–129 (2014).
10
W Schultz, Updating dopamine reward signals. Curr Opin Neurobiol 23, 229–238 (2013).
11
JK Seamans, CR Yang, The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Prog Neurobiol 74, 1–58 (2004).
12
K Wunderlich, P Smittenaar, RJ Dolan, Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418–424 (2012).
13
P Dayan, Twenty-five lessons from computational neuromodulation. Neuron 76, 240–256 (2012).
14
PR Montague, P Dayan, TJ Sejnowski, A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16, 1936–1947 (1996).
15
W Schultz, P Dayan, PR Montague, A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
16
K D’Ardenne, SM McClure, LE Nystrom, JD Cohen, BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267 (2008).
17
R Cools, SE Gibbs, A Miyakawa, W Jagust, M D’Esposito, Working memory capacity predicts dopamine synthesis capacity in the human striatum. J Neurosci 28, 1208–1212 (2008).
18
I Vernaleken, et al., “Prefrontal” cognitive performance of healthy subjects positively correlates with cerebral FDOPA influx: An exploratory [18F]-fluoro-l-DOPA-PET investigation. Hum Brain Mapp 28, 931–939 (2007).
19
AR Otto, SJ Gershman, AB Markman, ND Daw, The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol Sci 24, 751–761 (2013).
20
S de Wit, RA Barker, AD Dickinson, R Cools, Habitual versus goal-directed action control in Parkinson disease. J Cogn Neurosci 23, 1218–1229 (2011).
21
S de Wit, et al., Reliance on habits at the expense of goal-directed control following dopamine precursor depletion. Psychopharmacology (Berl) 219, 621–631 (2012).
22
Y Kumakura, P Cumming, PET studies of cerebral levodopa metabolism: A review of clinical findings and modeling approaches. Neuroscientist 15, 635–650 (2009).
23
F Schlagenhauf, et al., Ventral striatal prediction error signaling is associated with dopamine synthesis capacity and fluid intelligence. Hum Brain Mapp 34, 1490–1499 (2013).
24
MA McDannald, F Lucantonio, KA Burke, Y Niv, G Schoenbaum, Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J Neurosci 31, 2700–2705 (2011).
25
KE Stephan, WD Penny, J Daunizeau, RJ Moran, KJ Friston, Bayesian model selection for group studies. Neuroimage 46, 1004–1017 (2009).
26
TS Braver, JD Cohen, Dopamine, cognitive control, and schizophrenia: The gating model. Prog Brain Res 121, 327–349 (1999).
27
RJ Moran, M Symmonds, KE Stephan, KJ Friston, RJ Dolan, An in vivo assay of synaptic function mediating human cognition. Curr Biol 21, 1320–1325 (2011).
28
P Smittenaar, TH FitzGerald, V Romei, ND Wright, RJ Dolan, Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).
29
Y Niv, ND Daw, D Joel, P Dayan, Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology (Berl) 191, 507–520 (2007).
30
U Beierholm, et al., Dopamine modulates reward-related vigor. Neuropsychopharmacology 38, 1495–1503 (2013).
31
MJ Frank, LC Seeberger, RC O’reilly, By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).
32
SW Lee, S Shimojo, JP O’Doherty, Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
33
D Badre, J Hoffman, JW Cooney, M D’Esposito, Hierarchical cognitive control deficits following damage to the human frontal lobe. Nat Neurosci 12, 515–522 (2009).
34
E Koechlin, C Ody, F Kouneiher, The architecture of cognitive control in the human prefrontal cortex. Science 302, 1181–1185 (2003).
35
EE Steinberg, et al., A causal link between prediction errors, dopamine neurons and learning. Nat Neurosci 16, 966–973 (2013).
36
Y Goto, AA Grace, Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior. Nat Neurosci 8, 805–812 (2005).
37
A Egerton, A Demjaha, P McGuire, MA Mehta, OD Howes, The test-retest reliability of 18F-DOPA PET in assessing striatal and extrastriatal presynaptic dopaminergic function. Neuroimage 50, 524–531 (2010).
38
Y Goto, AA Grace, Dopamine modulation of hippocampal-prefrontal cortical interaction drives memory-guided behavior. Cereb Cortex 18, 1407–1414 (2008).
39
OD Howes, et al., The nature of dopamine dysfunction in schizophrenia and what this means for treatment. Arch Gen Psychiatry 69, 776–786 (2012).
40
JS Rakshi, et al., Frontal, midbrain and striatal dopaminergic function in early and advanced Parkinson’s disease A 3D [18F]dopa-PET study. Brain 122, 1637–1650 (1999).
41
HH Yin, BJ Knowlton, BW Balleine, Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur J Neurosci 19, 181–189 (2004).
42
HH Yin, SB Ostlund, BJ Knowlton, BW Balleine, The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci 22, 513–523 (2005).
43
B Knutson, SE Gibbs, Linking nucleus accumbens dopamine and blood oxygenation. Psychopharmacology (Berl) 191, 813–822 (2007).
44
BW Balleine, JP O’Doherty, Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
45
K Wunderlich, P Dayan, RJ Dolan, Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15, 786–791 (2012).
46
AJ Heinz, A Beck, A Meyer-Lindenberg, P Sterzer, A Heinz, Cognitive and neurobiological mechanisms of alcohol-related aggression. Nat Rev Neurosci 12, 400–413 (2011).
47
HE den Ouden, et al., Dissociable effects of dopamine and serotonin on reversal learning. Neuron 80, 1090–1100 (2013).
48
RJ Moran, et al., Free energy, precision and learning: The role of cholinergic neuromodulation. J Neurosci 33, 8227–8236 (2013).
49
R Cools, K Nakamura, ND Daw, Serotonin and dopamine: Unifying affective, activational, and decision functions. Neuropsychopharmacology 36, 98–113 (2011).
50
Y Goto, AA Grace, Limbic and cortical information processing in the nucleus accumbens. Trends Neurosci 31, 552–558 (2008).
51
J Hietala, et al., Depressive symptoms and presynaptic dopamine function in neuroleptic-naive schizophrenia. Schizophr Res 35, 41–50 (1999).
52
I Vernaleken, et al., Asymmetry in dopamine D(2/3) receptors of caudate nucleus is lost with age. Neuroimage 34, 870–878 (2007).
53
C Martin-Soelch, et al., Lateralization and gender differences in the dopaminergic response to unpredictable reward in the human ventral striatum. Eur J Neurosci 33, 1706–1715 (2011).
54
R Tomer, RZ Goldstein, GJ Wang, C Wong, ND Volkow, Incentive motivation is associated with striatal dopamine asymmetry. Biol Psychol 77, 98–101 (2008).
55
CS Patlak, RG Blasberg, Graphical evaluation of blood-to-brain transfer constants from multiple-time uptake data. Generalizations. J Cereb Blood Flow Metab 5, 584–590 (1985).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 112 | No. 5
February 3, 2015
PubMed: 25605941

Classifications

Submission history

Published online: January 20, 2015
Published in issue: February 3, 2015

Keywords

  1. dopamine
  2. decision making
  3. reinforcement learning
  4. PET
  5. fMRI

Acknowledgments

We thank Anne Pankow, Teresa Katthagen, Yu Fukuda, and Tobias Gleich for assistance during fMRI data acquisition and Stephan Lücke for organization and assistance during FDOPA PET. This study was supported by grants from the German Research Foundation (to F.S. and A.H.) [Deutsche Forschungsgemeinschaft (DFG) SCHL 1969/1-1, DFG SCHL1969/2-1, and DFG HE2597/14-1 as part of DFG FOR 1617]. L.D. and F.S. were supported by the Max Planck Society. Q.J.M.H. (DFG RA1047/2-1) and R. Boehme (GRK 1123/2) received funding from the German Research Foundation. R.J.D. is supported by a Wellcome Trust Senior Investigator Award (098362/Z/12/Z). A.H. received funding from the German Federal Ministry of Education and Research (01GQ0411; 01QG87164; NGFN Plus 01 GS 08152 and 01 GS 08159).

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

Lorenz Deserno1 [email protected]
Max Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation”, Max Planck Institute for Human Cognitive and Brain Sciences, 04130 Leipzig, Germany;
Department of Neurology, Otto von Guericke University, 39118 Magdeburg, Germany;
Department of Psychiatry and Psychotherapy, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany;
Quentin J. M. Huys
Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich and Swiss Federal Institute of Technology (ETH) Zurich, 8032 Zurich, Switzerland;
Department of Psychiatry, Psychotherapy and Psychosomatics, Hospital of Psychiatry, University of Zurich, 8032 Zurich, Switzerland;
Rebecca Boehme
Department of Psychiatry and Psychotherapy, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany;
Ralph Buchert
Department of Nuclear Medicine, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany;
Hans-Jochen Heinze
Max Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation”, Max Planck Institute for Human Cognitive and Brain Sciences, 04130 Leipzig, Germany;
Department of Neurology, Otto von Guericke University, 39118 Magdeburg, Germany;
Leibniz Institute for Neurobiology, Otto von Guericke University, 39118 Magdeburg, Germany; Departments of
Anthony A. Grace
Neuroscience,
Psychiatry and
Psychology, University of Pittsburgh, Pittsburgh, PA 15260;
Raymond J. Dolan
Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom;
Humboldt Universität zu Berlin, Berlin School of Mind and Brain, 10115 Berlin, Germany; and
Andreas Heinz
Department of Psychiatry and Psychotherapy, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany;
Cluster of Excellence NeuroCure, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany
Florian Schlagenhauf
Max Planck Fellow Group “Cognitive and Affective Control of Behavioral Adaptation”, Max Planck Institute for Human Cognitive and Brain Sciences, 04130 Leipzig, Germany;
Department of Psychiatry and Psychotherapy, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, 10115 Berlin, Germany;

Notes

1
To whom correspondence should be addressed. Email: [email protected].
Author contributions: L.D., Q.J.M.H., and F.S. designed research; L.D. and R. Boehme performed research; L.D., Q.J.M.H., R. Boehme, R. Buchert, and F.S. analyzed data; and L.D., Q.J.M.H., R. Boehme, R. Buchert, H.-J.H., A.A.G., R.J.D., A.H., and F.S. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to get full access to it.

    Single Article Purchase

    Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making
    Proceedings of the National Academy of Sciences
    • Vol. 112
    • No. 5
    • pp. 1239-1641

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media