Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

A general model of hippocampal and dorsal striatal learning and decision making

View ORCID ProfileJesse P. Geerts, View ORCID ProfileFabian Chersi, Kimberly L. Stachenfeld, and View ORCID ProfileNeil Burgess
  1. aSainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London W1T 4JG, United Kingdom;
  2. bInstitute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom;
  3. cGrAI Matter Labs, 75012 Paris, France;
  4. dDeepMind, London N1C 4AG, United Kingdom

See allHide authors and affiliations

PNAS December 8, 2020 117 (49) 31427-31437; first published November 23, 2020; https://doi.org/10.1073/pnas.2007981117
Jesse P. Geerts
aSainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London W1T 4JG, United Kingdom;
bInstitute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jesse P. Geerts
Fabian Chersi
bInstitute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom;
cGrAI Matter Labs, 75012 Paris, France;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fabian Chersi
Kimberly L. Stachenfeld
dDeepMind, London N1C 4AG, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Neil Burgess
bInstitute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom;
aSainsbury Wellcome Centre for Neural Circuits and Behaviour, University College London, London W1T 4JG, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Neil Burgess
  • For correspondence: n.burgess@ucl.ac.uk
  1. Edited by György Buzsáki, New York University Langone Medical Center, New York, NY, and approved October 20, 2020 (received for review April 24, 2020)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

A central question in neuroscience concerns how humans and animals trade off multiple decision-making strategies. Another question pertains to the use of egocentric and allocentric strategies during navigation. We introduce reinforcement-learning models based on learning to predict future reward directly from states and actions or via learning to predict future “successor” states, choosing actions from either system based on the reliability of its predictions. We show that this model explains behavior on both spatial and nonspatial decision tasks, and we map the two model components onto the function of the dorsal hippocampus and the dorsolateral striatum, thereby unifying findings from the spatial-navigation and decision-making fields.

Abstract

Humans and other animals use multiple strategies for making decisions. Reinforcement-learning theory distinguishes between stimulus–response (model-free; MF) learning and deliberative (model-based; MB) planning. The spatial-navigation literature presents a parallel dichotomy between navigation strategies. In “response learning,” associated with the dorsolateral striatum (DLS), decisions are anchored to an egocentric reference frame. In “place learning,” associated with the hippocampus, decisions are anchored to an allocentric reference frame. Emerging evidence suggests that the contribution of hippocampus to place learning may also underlie its contribution to MB learning by representing relational structure in a cognitive map. Here, we introduce a computational model in which hippocampus subserves place and MB learning by learning a “successor representation” of relational structure between states; DLS implements model-free response learning by learning associations between actions and egocentric representations of landmarks; and action values from either system are weighted by the reliability of its predictions. We show that this model reproduces a range of seemingly disparate behavioral findings in spatial and nonspatial decision tasks and explains the effects of lesions to DLS and hippocampus on these tasks. Furthermore, modeling place cells as driven by boundaries explains the observation that, unlike navigation guided by landmarks, navigation guided by boundaries is robust to “blocking” by prior state–reward associations due to learned associations between place cells. Our model, originally shaped by detailed constraints in the spatial literature, successfully characterizes the hippocampal–striatal system as a general system for decision making via adaptive combination of stimulus–response learning and the use of a cognitive map.

  • reinforcement learning
  • spatial navigation
  • hippocampus
  • striatum

Behavioral and neuroscientific studies suggest that animals can apply multiple strategies to the problem of maximizing future reward, referred to as the reinforcement-learning (RL) problem (1, 2). One strategy is to build a model of the environment that can be used to simulate the future to plan optimal actions (3) and the past for episodic memory (4⇓–6). An alternative, model-free (MF) approach uses trial and error to estimate a direct mapping from the animal’s state to its expected future reward, which the agent caches and looks up at decision time (7, 8), potentially supporting procedural memory (9). This computation is thought to be carried out in the brain through prediction errors signaled by phasic dopamine responses (10). These strategies are associated with different tradeoffs (2). The model-based (MB) approach is powerful and flexible, but computationally expensive and, therefore, slow at decision time. MF methods, in contrast, enable rapid action selection, but these methods learn slowly and adapt poorly to changing environments. In addition to MF and MB methods, there are intermediate solutions that rely on learning useful representations that reduce burdens on the downstream RL process (11⇓–13).

In the spatial-memory literature, a distinction has been observed between “response learning” and “place learning” (14⇓–16). When navigating to a previously visited location, response learning involves learning a sequence of actions, each of which depends on the preceding action or sensory cue (expressed in egocentric terms). For example, one might remember a sequence of left and right turns starting from a specific landmark. An alternative place-learning strategy involves learning a flexible internal representation of the spatial layout of the environment (expressed in allocentric terms). This “cognitive map” is thought to be supported by the hippocampal formation, where there are neurons tuned to place and heading direction (17⇓–19). Spatial navigation using this map is flexible because it can be used with arbitrary starting locations and destinations, which need not be marked by immediate sensory cues.

We posit that the distinction between place and response learning is analogous to that between MB and MF RL (20). Under this view, associative reinforcement is supported by the DLS (21, 22). Indeed, there is evidence from both rodents (23⇓–25) and humans (26, 27) that spatial-response learning relies on the same basal ganglia structures that support MF RL. Evidence also suggests an analogy between MB reasoning and hippocampus (HPC)-based place learning (28, 29). However, this equivalence is not completely straightforward. For example, in rodents, multiple hippocampal lesion and inactivation studies failed to elicit an effect on action-outcome learning, a hallmark of MB planning (30⇓⇓⇓⇓–35). Nevertheless, there are indications that HPC might contribute to a different aspect of MB RL: namely, the representation of relational structure. Tasks that require memory of the relationships between stimuli do show dependence on HPC (36⇓⇓⇓⇓⇓–42).

Here, we formalize the perspective that hippocampal contributions to MB learning and place learning are the same, as are the dorsolateral striatal contributions to MF and response learning. In our model, HPC supports flexible behavior by representing the relational structure among different allocentric states, while dorsolateral striatum (DLS) supports associative reinforcement over egocentric sensory features. The model arbitrates between the use of these systems by weighting each system’s action values by the reliability of the system, as measured by a recent average of prediction errors, following Wan Lee et al. (43). We show that HPC and DLS maintain these roles across multiple task domains, including a range of spatial and nonspatial tasks. Our model can quantitatively explain a range of seemingly disparate findings, including the choice between place and response strategies in spatial navigation (23, 44) and choices on nonspatial multistep decision tasks (45, 46). Furthermore, it explains the puzzling finding that landmark-guided navigation is sensitive to the blocking effect, whereas boundary-guided navigation is not (27), and that these are supported by the DLS and HPC, respectively (26). Thus, different RL strategies that manage competing tradeoffs can explain a longstanding body of spatial navigation and decision-making literature under a unified model.

Results

We implemented a model of hippocampal and dorsolateral striatal contributions to learning, shown in Fig. 1. Each system independently proposes an action and estimates its value. The value Q(s,a) of taking action a while being in state s is the expected discounted cumulative return:Q(s,a)=Eπ∑t=0∞γtr(st)|s0=s,a0=a,[1]where s0 and a0 are the starting state and action at time t=0, r is a reward function specifying the instantaneous reward found in each state, γ∈[0,1) is a discount factor that gives smaller weight to distal rewards, and π(a|s) is the policy specifying a distribution over available actions given the current state. The objective of the RL agent is to discover an optimal policy π* that will maximize value over all states.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

(A) Model architecture. DLS (orange) learns value directly from landmark features in egocentric directions with respect to the agent: L (left), R (right), F (front), or B (back). HPC (green) learns an SR M over allocentric input features (north, N; east, E; south, S; or west, W), which is subsequently used for value computation. An arbitrator (blue) computes an average of these values, weighted by each system’s reliability (Materials and Methods). Lighter colors mean higher firing rates. α, learning rate; δM, SPE; δr, reward-prediction error; PHPC, proportion of influence of HPC component. (B) A linear track environment with five states. Terminal state S5 gives a reward with probability 0.8. (C) Reliability of the hippocampal SR system and the striatal MF system over time as the agent navigates the linear track. Reliability is computed based on the recent average of SPEs δM for the hippocampal system- and reward-prediction errors δR for the striatal system. (D) The proportion of influence of the SR system on the value function, PSR, in the linear track environment across trials.

Similarly to earlier work in spatial RL (15, 47⇓–49), the two systems in our model estimate value using qualitatively different strategies, which can cause them to generate divergent predictions for the optimal policy. The dorsal striatal component uses an MF temporal difference (TD) method (50) to learn stimulus–response associations directly from egocentric sensory inputs given by landmark cells (LCs) tuned to landmarks at given distances and egocentric directions from the agent (Fig. 1A and Materials and Methods).

The hippocampal component, in contrast, has access to state information provided by place cells that, in spatial tasks, fire when the agent occupies specific locations. We draw on previous work by Stachenfeld et al. (51) and model hippocampal place cells as encoding the successor representation (SR; ref. 11). The SR is a predictive representation, containing the discounted future occupancy of each state s′ from current state s:Mπ(s,s′)=Eπ∑t=0∞γtI(st=s′)|s0=s,[2]where I(st=s′)=1 if st=s′ and 0 otherwise. Each entry Mπ(s,s′) of the SR estimates the exponentially discounted count of the number of times state s′ is visited in the future, given that the current state is s, conditioned on the current policy π(a|s). In addition to the SR, the hippocampal system learns a vector of rewards R associated to each state, which is multiplied with the SR to compute state values (Eq. 8). Crucially, the hippocampal SR algorithm learns aggregate statistics over the relational structure between states, which allows for some of the flexibility of fully MB systems at lower computational cost. Specifically, SR-based systems decouple learning about transition dynamics from learning about reward, which allows for a quick recomputation of value under a new reward distribution.

Arbitration between the two systems was achieved by tracking their reliability in predicting states (HPC) and rewards (DLS) and weighting either systems’ action values by this reliability, following Wan Lee et al. (43). We operationalized this as the average recent reward-prediction error for the MF system and as the average successor state-prediction error for the SR system. These reliability measures were then used to compute the proportion of influence the SR system had on the value function, PSR (see Eq. 18 for details). Although not modeled in detail here, we suggest that this arbitration is supported by the medial prefrontal cortex, following previous theoretical and experimental work (2, 52). Fig. 1 B–D shows an example of how the arbitrator functions. The agent was trained to find a reward (given with probability 0.8) at the end of a simple linear track, in which each state was uniquely identified by landmarks (Fig. 1B). The agent was allowed to explore the environment randomly, so it started with a random-walk SR. Hence, the reliability of the HPC starts out higher than that of the DLS. As the average DLS reward-prediction error goes down, and its reliability catches up with that of HPC, the proportion of HPC influence decreases.

To test the validity of our model, we applied it to spatial and nonspatial decision-making tasks and compared its behavior to that of humans and rodents.

Hippocampal Lesions and Adapted Water-Maze Navigation.

An adaptation to the classic Morris water-maze task—in which rodents swim in opaque water to find an invisible platform—involved putting an intramaze landmark into the pool at a fixed offset from the platform and moving both platform and landmark to a different location within the tank at the start of each block of four trials (ref. 44 and Fig. 2A). In this version of the task, hippocampally lesioned animals performed better than intact animals on the first trial of each session, because intact animals initially lingered at the previous goal location (Fig. 2B). However, these animals showed little intrasession learning, while learning across sessions was relatively unimpaired, indicating that they were learning to navigate to the goal location relative to the landmark, since this relationship remained constant across sessions.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Results and simulations of the experiment are described in ref. 44. Sessions lasted four trials, and platform and landmark were moved at the beginning of each session. (A) Possible locations of the hidden platform (o) and the corresponding landmark (x) in each session. (B) Escape latency in the water maze for hippocampal lesioned and control animals on trials 1 (solid lines) and 4 (dashed line) of each session. Hippocampal damage impairs intrasession learning, but preserves learning across sessions. Because animals with hippocampal damage follow a response strategy based on egocentric visual input, they perform better on the first trial of each session than control animals. Reprinted from ref. 15. Copyright (2015), with permission from Elsevier. (C) Equivalent plot for the full model (blue) and the model without a hippocampal component, relying solely on MF mechanisms. (D) Example trajectories from the first trials of sessions 7 and 8. Animals using a hippocampal place strategy tend to wander around the previous platform location (filled circles) before finding the new platform location (open circles) (adapted from ref. 44). (E and F) Occupancy maps show a similar effect for simulated agents. Control agents (E) linger around the previous platform location, whereas agents that cannot use map-based navigation take a more direct path to the new platform location.

In the model, the session-by-session displacement of landmark and platform means that the value function will have to change when using allocentric place-cell features, but not when using egocentric LC features. Hence, when we simulated this task by comparing the performance of the full model to a model with a silenced hippocampal component, our model showed the same effects as in the original experiments (Fig. 2C). Fast within-session learning, which relies on the SR’s capacity for quick reevaluation of rewards, was impaired after a hippocampal lesion. Between-session learning, which depends on learning the landmark–platform relations, was unimpaired. Finally, control agents performed worse than hippocampally lesioned agents on the first trial after the platform had been moved, because the value function changed in allocentric, but not egocentric, coordinate frames. An inspection of the occupancy maps (Fig. 2 D–F) reveals that equivalent errors were made by the agents and by the rats—i.e., lingering at the previous platform location. The hippocampal predictive map guides the agent to the previous platform location because of its allocentric place representation. Only when it reaches that location and the platform is not there does it start unlearning the hippocampal reward representation; Eq. 11.

Simulating DLS lesions in the task used by Pearce et al. (44) showed the emergence of the opposite pattern to that of HPC lesions: There was little to no learning across sessions for the first trials, while fourth-trial performance was not significantly worse than control performance (SI Appendix, Fig. S2A). This is consistent with previous findings showing that lesions of the DLS induced a preference for place-guided navigation (53) and that dopamine depletion in the DLS impairs egocentric, but not allocentric, water-maze navigation (54). Our model also accurately captures results from Miyoshi et al. (55), who classified navigation behaviors as cue-guided or place-guided in the cued water-maze task after lesions to both the HPC and the DLS (SI Appendix, Fig. S2 B and C).

These results show that our model captures both landmark-guided and place-memory-guided behavior on the water maze. Furthermore, our model gives a normative perspective on why the animals switch to a landmark-based strategy: Since the striatal system learns about the rewarded location with respect to landmarks, it can use the landmark to navigate directly to the correct location on the first trial of a given session. This gives an advantage to using the striatal system for decision making, which agents learn to exploit. Over the course of multiple sessions, the average prediction error of the striatal system will decrease, causing the reliability-based arbitration mechanism to favor the striatal system, driving lower escape times on first trials of later sessions.

Animals Switch to a Response Strategy on the Plus Maze.

The distinct roles of the HPC and dorsal striatum have also been investigated by using the place/response learning task (23, 24). In this task, rats were trained to find a food reward on one arm of a plus maze, starting in the same arm every time, while the opposite arm was blocked (Fig. 3). After training, a probe trial was performed, in which the animal started at the opposite end of the maze. If animals take the same egocentric turning direction as before, thus ending up at the opposite goal arm, their strategy is interpreted as response learning (relying on a remembered egocentric turn). If they take the opposite turn to end up in the same goal arm, their strategy is interpreted as flexible place learning (relying on an allocentric representation of space).

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Navigation in the plus maze. (A) Experimental setup used by ref. 23. During training, animals were trained to run from the same starting place to a baited goal arm. During probe trials (on day 8 and day 16), the animal started in the opposite arm. If the animal ran to the same allocentric location as during training, this was labeled as a place strategy (green). Taking the same egocentric turn to end up in the opposite goal arm was classified as a response-learning strategy (orange). (B) Behavioral data from ref. 23. Control animals (blue) showed a shift to response learning over the course of training. This was prevented by the inactivation of DLS using lidocaine. The inactivation of HPC using lidocaine caused animals to use a response strategy early on. (C) Model results recapitulate these findings. (D and E) Behavioral data from ref. 56 showing probe-trial behavior before and after the outcome was devalued (deval) by prefeeding the animal with the food reward, for control (D) and hippocampally lesioned animals (E). D and E are reprinted from ref. 56, which is licensed under CC BY 4.0. (F and G) Model-simulation results recapitulate these findings.

Fig. 3 shows the results of the original experiment and our simulations. Early in training, most control rats (injected with saline) used a place strategy, but switched to a response strategy after extensive training. Inactivation of the dorsal striatum with lidocaine prevented this switch. Inactivation of the HPC, by contrast, caused the response strategy to be used more often, even early in training. These results indicate that the dorsal striatum supports response learning, while the HPC supports place learning. We simulated the lidocaine inactivation of HPC and dorsal striatum by partly deactivating the SR and MF components of our model, respectively. Early in training, the control agent showed a preference for actions proposed by the HPC, leading the agent to follow a place strategy. This is because the SR reliability was higher than the MF reliability at the start of training, reflecting the fact that animals have explored the environment without rewards before training. Over the course of training, reward-prediction errors in the striatum decreased, causing the reliability of the MF system to increase, at which point the model switched to the MF strategy because of a bias to use the more computationally efficient system. Inactivation of the dorsal striatal and hippocampal components of the model biases the agent to follow a place or response strategy, respectively.

While the results described above show that the DLS and HPC are involved in egocentric and allocentric navigation, respectively, the navigational strategy alone does not speak to an important aspect of MB learning: flexibility in the face of reward devaluation. In devaluation studies, the value of a reinforcer is decreased by pairing it with an aversive event such as illness or by inducing satiety by prefeeding the animal with the reinforcer (57). Since MF algorithms need to reexperience the state/action leading to the devalued reward to update its value, MF behavior (also referred to as stimulus–response learning) is insensitive to devaluation. MB algorithms, in contrast, can estimate that state/action transitions will lead to a devalued reward without having to reexperience them. This goal-directed, devaluation-sensitive behavior is a hallmark of MB planning (2, 58).

To investigate the relationship between place and response learning on one hand, and goal-directed and stimulus–response learning on the other, we simulated results from Kosaki et al. (56), who studied devaluation on the plus maze. Specifically, they trained rats on the same task as described in Fig. 3A (see ref. 59 for a similar study in mice). Subsequently, they devalued the food reinforcer by prefeeding the animals. The results of this devaluation procedure are depicted in Fig. 3D. Consistent with the idea that the place strategy is sensitive to the expected value of the outcome, while the response strategy is not, the procedure resulted in a switch from place to response strategies. Furthermore, rats with hippocampal lesions displayed a reliance on the response strategy, regardless of outcome devaluation (Fig. 3E), further indicating that the response strategy is insensitive to devaluation. Since sensitivity to reward devaluation is also a property of SR-based learning (60), our model naturally accommodates these results.

Blocking in Landmark But Not Boundary-Related Navigation.

A signature of learning stimulus–reward associations using reward-prediction errors is the blocking phenomenon (61). Learning one stimulus–reward association hinders learning of a subsequent association between a different stimulus and the same reward because the prediction error becomes small, reducing further weight updates. In humans, spatial blocking has been shown to occur when learning locations relative to discrete landmarks, but not relative to boundaries (27). Furthermore, learning with respect to landmarks corresponds to increased blood-oxygen-level-dependent (BOLD) signal in the dorsal striatum, whereas learning with respect to boundaries corresponds to activity in the posterior HPC (26).

We aimed to capture these effects by examining the behavior of our agent, following a paradigm similar to ref. 27 (Fig. 4): The agent navigated through an open field to find an unmarked reward location. In order to investigate blocking with respect to boundaries, we explicitly modeled the effect of boundaries on hippocampal place cells, given their dominant role in determining place-cell firing fields (cf. 62 and 63). Rather than learning an SR over a punctate-state representation, the agent learned a matrix of successor features provided by the firing rates of a set of place cells driven by boundary vector cells (BVCs) (64⇓⇓–67).

Fig. 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 4.

Boundary versus landmark-blocking experiments, similar to ref. 27. (A) Landmark blocking experiment. Agents navigate a virtual water maze to find a hidden platform (dashed circle). During initial learning, one landmark is present (L1). During compound learning, a second landmark is added (L2), after which L1 is removed. (B) Average time to find the platform per trial. Increased escape times on removal of L1 indicates blocking of learning about platform location relative to L2 by the prior learning relative to L1. (C) Boundary-blocking experiment, following A, but with two boundaries (solid green and blue lines). (D) Average escape time shows no effect of blocking of learning platform location relative to the right boundary (blue) when the left boundary (green) is removed. (E) Illustration of the lack of blocking in boundary-related learning under the SR system, in contrast to an MF system.

In the landmark blocking condition (Fig. 4 A and B), the agent used a landmark to guide navigation. After 10 trials, a second landmark was added, and after 20 trials, the first landmark was removed. Importantly, in this experiment, there were no boundaries, and only one or two landmarks were visible at any time. A single landmark has little effect on place cell firing (63), and, indeed, the presence of a single or two landmarks does not support a reliable place-cell map (64). Therefore, and consistent with BOLD activation results (26), we assume that behavior was controlled by the DLS in this experiment.

As predicted by the TD learning rule, and consistent with the findings of Doeller and Burgess (27), learning about the second landmark was blocked by the prior learning about landmark 1, as evidenced by the drop in performance after its removal.

In the boundary-locking condition (Fig. 4 C and D), there were no landmarks, meaning that the agent had to rely on its hippocampal system for navigation. The hippocampal system learns a predictive map over boundary-related place-cell activations using successor-prediction errors (SPEs; SI Appendix). Prediction-error-based learning like that is susceptible to the blocking effect, and the SR has indeed been used as an explanation for the occurrence of blocking, when learning stimulus–stimulus associations (60). However, when we subjected the agent to a boundary-related blocking paradigm, no blocking occurred (Fig. 4 C and D).

To understand why this happens, consider the situation in Fig. 4E, in which one example place cell was active at the rewarded location, driven by the left boundary. During initial learning, an association between that place cell and the reward was learned. During compound learning, a second boundary drove the activity of another place cell at the rewarded location. In an MF system, the learned value associated to the previous place cell means there was zero prediction error, preventing learning of an association between the second place cell and the reward. In an SR system, however, the agent learns a predictive relationship between the two place cells. Thus, while there is no reward-prediction error, and the reward vector remains unchanged, the newly firing place cell comes to predict the firing of the first place cell (that is associated with reward), mitigating its reduction in firing when the first boundary is removed. This means that, when the first boundary and its associated firing are removed, the agent still predicts reward at the correct location. Thus, consistent with behavioral evidence (26, 27), our model shows no blocking effect during the boundary-related navigation paradigm. This result speaks to the utility of structure learning: The hippocampal SR system learns a multitude of relations, such that its policies are more robust to change in cues and rewards.

Two-Step Task.

Outside of the spatial domain, the distinction between MF and MB RL has been heavily investigated by using sequential decision tasks. Here, we describe how our model solves a cognitive decision task of this type—the task of Daw et al. (46) (Fig. 5A).

Fig. 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 5.

A nonspatial two-step task. (A) Task employed by Daw et al. (46). Here, a single start state led probabilistically to one of either two second states, depending on the action chosen and whether by chance a rare (70%) or common (30%) transition was made. (B) Data from Daw et al. (46) showing that human performance lies in between MF and MB. A and B are reprinted from ref. 46, which is licensed under CC BY 3.0. (C) Simulation results for the striatal (Left), hippocampal (Center), and full (Right) models.

In the two-step decision task designed by Daw et al. (46), human participants were shown a pair of symbols and asked to choose one (Fig. 5A). Left or right choices lead to different corresponding second-stage states with high probability (common transitions), but there was a small probability (rare transitions) that the agent transitions to the opposite state. For example, in Fig. 5A, the left icon in the first (green) state usually leads to the choice in the pink state (common transition), but occasionally leads to the choice in the blue state (rare transition). During the second stage, participants made another left-or-right choice, resulting in either receiving a reward or not, before starting the next trial. Each of the four outcomes was associated with a reward probability that varied over time as a Gaussian random walk limited between 0.25 and 0.75.

The rewards received or not received on a given trial modify the participants’ value estimates for the different actions taken during the two stages, but different RL strategies lead to different behaviors on the next trial. MF learners increased the likelihood of repeating their first-stage action following a reward, regardless of whether a common or rare transition was made. In contrast, MB learners used knowledge of the task’s transition structure, such that rewards obtained after a rare transition lead to the opposite choice on the next trial (to maximize the likelihood of reaching the same second state). The key finding of Daw et al. (46) was that human choices reflect both MB and MF influences (Fig. 5B).

Our model recapitulates these findings and suggests the HPC could support MB choice in this task, as well as another two-step decision task with deterministic transitions (SI Appendix, Fig. S3 and ref. 45). The model DLS, implementing an MF RL system, increased stay probability after rewards, regardless of whether a rare or common transition was made (Fig. 5C). In contrast, the HPC uses the SR to generalize value over the graph. When a goal state is reached and a reward is obtained, value is generalized over the graph, according to the degree to which states predict each other. Therefore, on the next trial, the actions were taken that will most likely lead to the recent goal state. Separating transition dynamics from reward estimates thus recapitulates true MB behavior. Combining the two systems results in behavior that is similar to that of human participants in this task.

It has been shown that other, simpler models than pure MB systems can look like MB agents on the two-step task (68). Here, we show that the SR can mimic MB behavior. Because the transition structure is unchanging, caching future state predictions is sufficient for flexible behavior.

Relationship Between Spatial and Two-Step Tasks.

A central principle of our model is that MB reasoning and allocentric navigation strategies both rely on the same hippocampal structures. The most direct evidence for this comes from Vikbladh et al. (29), in which both healthy participants and patients with hippocampal damage performed the two-step planning task (46), as well as a landmark versus boundary spatial memory task (26). This allowed the authors to show that, in healthy participants, the degree of MB planning on the sequential decision task correlated with the contribution of allocentric, boundary-driven place memory on the spatial task (reflected in smaller errors from the location predicted by the boundary; Fig. 6A). Notably, this correlation cannot be accounted for by variation in general intelligence (intelligence quotient). In patients with hippocampal damage, however, this relationship was significantly reduced.

Fig. 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 6.

Relationship between MB planning and allocentric spatial memory. Error bars indicate 80% CIs of the regression in both panels. (A) Data from healthy control participants and anterior temporal lobectomy patients, from ref. 29. Allocentric place memory is reflected by responses close to the boundary-predicted location after the landmark has moved (i.e., smaller boundary-distance errors). Dots indicate MB estimates for individual participants, calculated from a mixed-effects logistic regression. Reprinted from ref. 29. Copyright (2019), with permission from Elsevier. (B) Simulation data for the full model and agents for which the HPC component was turned off. Here, allocentric place memory is reflected by the average distance between the previous platform location and the location of the maximum of the agent’s value function at the start of the next session. Dots represent estimates for individual agents, estimated by a mixed-effects logistic regression.

To test for this effect in our model, we sampled a set of 20 agents with different values for the parameters governing the hippocampal–striatal tradeoff, as well as 20 agents with a partially lesioned hippocampal component (SI Appendix). Each agent performed the two-step decision task (46) and the water-maze task of Pearce et al. (44), depicted in Fig. 2. MB planning was quantified as the interaction between effects of reward and transition type in the previous trial on staying with the same action or switching in the next trial (SI Appendix and cf. refs. 29 and 46). We quantified the degree of allocentric place memory as the average distance between the previous platform location and the location of the maximum of the agent’s value function at the start of the next session. This is akin to the boundary distance error employed by ref. 29. We found a significant correlation (z=1.89,p<0.001) between model based and allocentric planning (Fig. 6B). Agents with hippocampal lesions did not show a significant correlation (z=−0.02,p=0.97), and the difference between these correlation coefficients was significant (z=5.44,p<0.001), recapitulating the result found by Vikbladh et al. (29).

Discussion

We presented a model of hippocampal and dorsolateral striatal contributions to learning across both spatial navigation and nonspatial decision making. Our simulations support the view that the HPC serves both allocentric place learning and flexible decision making by supplying a predictive map of the underlying structure of the task or environment, whereas the DLS underlies MF learning based on (egocentric) sensory features and actions and that these systems combine weighted by their relative reliability in predicting outcomes.

The involvement of the HPC in abstract nonspatial tasks raises questions about its role throughout evolution. Did the system evolve initially in the spatial domain, but become recruited more generally (14), or was spatial decision making always part of a more general ability (69)? The role of the HPC in MB decision making is much debated. On one hand, lesions of the HPC have not affected hallmarks of MB planning, such as outcome devaluation in lever-pressing tasks (32, 33), although a recent study showed that HPC is involved in devaluation sensitivity of lever pressing immediately after acquisition (when pressing is context-dependent; ref. 70). On the other hand, hippocampal lesions led to a loss of devaluation-sensitivity on the plus maze (Fig. 3 and ref. 56) and impair MB behavior on the two-step task (Fig. 5 and refs. 28 and 29). One crucial difference between the lever-pressing tasks and the tasks simulated here is that the lever-pressing tasks required only one action–outcome association, whereas solving the two-step task and many spatial tasks require chaining multiple action–outcome associations together. Perhaps then, as suggested by Miller et al. (28), the HPC is specifically required when planning requires linking actions to outcomes over multiple steps. By storing temporal abstractions of future states separately from a representation of reward, the SR is particularly well suited for this task of rapidly propagating novel reward information to distant states. That property of the SR has previously inspired models of temporal context memory (71) and might also relate to the role of relational memory tasks more broadly, as they require chaining multiple stimulus–stimulus associations together (37, 39). In line with this role, our simulations showed the hippocampal SR as driving a correlation between spatial-memory performance and MB behavior (Fig. 6 and ref. 29).

Consistent with our model, dorsal striatal neurons showed a great degree of spatial coding in spatial tasks (72), but not in tasks where reward locations were explicitly dissociated from space (73) or where multiple locations were equivalently associated with rewards (74). Indeed, dorsal striatum selectively represents those task aspects, which computational accounts suggest are important for gradual, MF learning (72).

We specifically associate our striatal model with the DLS. Lesion and inactivation studies have shown that the dorsal striatum is functionally very heterogeneous (75). Lesions of the dorsomedial striatum (DMS) result in a switch to response strategies on the plus maze (76) and to cue-based responding in the water maze, while the DLS underlies response learning (77). Furthermore, the DMS has been implicated in learning action–outcome contingencies outside the spatial domain (21, 75). Anatomical connectivity supports this functional dissociation in the dorsal striatum (53, 75). Whereas the DLS receives inputs mostly from sensorimotor cortex and dopaminergic input from the substantia nigra, the DMS receives input from several mesocortical and allocortical areas including the HPC. Indeed, cells encoding route and heading direction have been found in the DMS (78, 79). It is, therefore, likely that the dorsal HPC and the DMS are part of a single circuit involved in flexible goal-directed decision making, whereby the HPC provides map-based information, and the DMS is involved in action selection.

Our work follows several models of spatial decision making by hippocampal and striatal systems (15, 48, 49, 80, 81). Dollé and colleagues (48, 49) used a similar hippocampo-striatal model to explain behavior on the adapted water-maze task (44), presented in Fig. 2. Our model differs in two important ways. Firstly, in their model, place cells connected to “graph cells” that formed an explicit topological graph of the spatial environment, used to explicitly plan a path to the goal. In the present model, by contrast, the topological structure of the environment is implied in the predictive SR, following a theoretical proposal by Stachenfeld et al. (51) and neuroimaging (40, 41) and behavioral findings (82). Thus, our agent mimicked true MB behavior (explicit graph search) by using an intermediate SR-based strategy. Secondly, their model used another expert network that learned whether to take striatal or hippocampal outputs using TD learning. In contrast, our model arbitrates between systems based on their reliability. This arbitration mechanism predicts that on trials with high reward-prediction error, control should shift away from the MF system. In contrast, a low predictability of state transitions leads to higher average errors in the SR system and should, therefore, lead to a higher degree of MF control. Evidence for this comes from Wan Lee et al. (43), who, furthermore, showed that the prefrontal cortex encodes neural correlates of arbitration based on reliability.

As noted above, the hippocampal results we simulated are also consistent with a fully MB system, which is strictly more flexible. An interesting question is how to disambiguate between animals using an MB strategy versus the SR. One weakness of the temporal-difference SR model used here is that it cannot respond flexibly when the transition structure changes. Momennejad et al. (83) have shown that humans are better at revaluating when the reward function changes than when the transition structure changes, consistent with use of an SR. In addition, hippocampal replay has been suggested to perform off-line updates of the hippocampal predictive map to incorporate these kinds of transition changes (84, 85). As an alternative, tracking input covariances and using these for updating the SR allow it to solve certain kinds of transition-revaluation problems without requiring forward simulation (86). A second weakness of the SR, compared to MB systems, is that the SR is policy-dependent. This means that the SR corresponding to an optimal policy for one reward setting is of limited use for problems with a different reward function (87). Piray and Daw (88) have recently proposed that the hippocampal system might resolve this latter weakness using a default representation, corresponding to a default policy. Alternatively, the HPC might represent a set of multiple distinct SR maps corresponding to different policies (89). Taken together, these two failure modes of the SR provide interesting avenues for experiments probing animals’ behavioral strategies and for theoretical work on computational tradeoffs between these strategies.

In addition to the HPC, the orbitofrontal cortex (OFC) has been hypothesized to be important for representing states in RL problems. Wilson, Niv, and colleagues (90) introduced a model in which OFC plays a critical role in identifying states that are perceptually similar. This corresponds to data showing that OFC is specifically necessary for decision making in partially observable environments (91). Evidence for this theory comes from human functional MRI research showing that unobservable task states can be decoded from OFC and that this relates to task performance (92). This proposed role of the OFC is distinct from, and possibly complementary to, our proposed role for the HPC. In our model, the HPC encodes a predictive map based on observable features that can be used for rapid, flexible decision making. The OFC, on the other hand, is crucial for a general state representation that can be used for downstream MB or MF processes. Whether and how the OFC and the HPC can interact to allow SR learning in partially observable environments is an interesting avenue for further research (see also ref. 93).

Our explanation for the absence of boundary-related blocking (Fig. 4) relies on BVC inputs to hippocampal place cells. BVCs can respond to intramaze landmarks as well as to boundaries (although, in contrast to DLS LCs, BVCs fire irrespective of object identity; ref. 67). This means that a sufficient number of landmarks could drive a reliable place-cell representation of space, allowing hippocampal control and the prevention of blocking. However, in the experiments simulated here, there were only one or two landmarks present. Single landmarks have little influence on firing relative to extended boundaries (63), consistent with the BVC model. Because BVCs fire proportionally to the angle subtended by the stimulus (94), place cells do not provide a reliable representation of space when there is only a single landmark (64). Thus, we predict that the addition of greater numbers of landmarks should allow construction of a reliable place-cell map, thereby leading to increased hippocampal influence and a reduction of blocking effects.

Our model reflects the assumption, driven by our knowledge of the neural representations, that in spatial tasks, the hippocampal SR system uses allocentric representations, while the MF system uses egocentric representations. This allowed us to fit the behavioral data well and raised the question of why the goal-directed system is allocentric, while the stimulus–response system is egocentric? Perhaps an answer lies in the time scale of learning: The allocentric layout of a large environment is stable, irrespective of your changes in location or direction, making it suitable for learning long-term relationships between stimuli. Consistent with this idea, “slow feature analysis” produces grid and place-cell representations from visual inputs because they vary slowly (95). On the other hand, egocentric representations are more suited to mapping sensory inputs to physical actions, both of which are specified egocentrically.

In conclusion, dorsal HPC and DLS support qualitatively different strategies for learning about reward in spatial as well as nonspatial contexts, as captured by the model presented here. The fact that the same model explains behavior in both types of tasks implies that the hippocampal–striatal system is a general-purpose learning device that adaptively combines MB and MF mechanisms.

Materials and Methods

Hippocampal and Striatal Systems for Decision Making.

Our model combines a hippocampal RL module based on the SR with a striatal model based on MF value learning (Fig. 1A). It arbitrates between these modules based on their relative reliability, which can be computed by using the average of recent prediction errors. Model details are outlined below.

Dorsal Striatal System.

The DLS module was implemented as an MF RL system that learned direct associations between sensory stimuli and actions. Striatal neurons coded for the value of each action, where actions were expressed as egocentric-heading directions in the spatial-navigation tasks and left or right button presses in the nonspatial tasks. Sensory input was coded by a set of egocentric landmark vector cells coding for the presence or absence of a landmark in a particular egocentric direction, at a particular distance from the landmark to the agent, analogous to the egocentric BVCs recently reported (96). Specifically, the activation of each LC was modeled as a bivariate Gaussian in a space defined by the egocentric angle θ and distance d of the landmark to the agent:fLC(d,θ)∝N([d,θ];[d*,θ*],Σ),[3]where d* and θ* are the preferred distance and orientation of the LC, respectively, and Σ=diag([σd,σθ]) is the covariance matrix with the tuning width and length of the receptive field on the diagonal entries. We assumed that LCs are sensitive to the identity of the landmark, meaning that a different set of LCs will respond to a different landmark in our model. An example egocentric LC is shown in SI Appendix, Fig. S1. In the nonspatial tasks, states were encoded as “one-hot” vectors containing ones for their state indexes, reflecting the fact that states were uniquely identifiable as different images.

LCs in the sensory layer project to neurons in the dorsal striatum in an all-to-all connected way:xaDLS=QDLS(s,a)=∑i=1Nwi,afiLC(s),[4]where fiDLS is the activity of LC i, xaDLS is the firing rate of the dorsolateral striatal neuron corresponding to striatal estimated value QDLS of action a given state s, N is the total number of sensory neurons, uiLC is the firing rate of LC i, and wi,a is the weight from sensory neuron i to striatal neuron a.

Learning in the striatal network is mediated by a Q-learning rule (50). This allows the model to compute a TD reward-prediction error δtr:δtr=rt+1+γmaxa′QDLS(st+1,a′)−QDLS(st,at),[5]where rt+1 is the reward received at time t+1. This prediction error is then used to update the weights:Δwi,a=αQδtrei,a,[6]with learning rate αQ and eligibility trace ei,a, which tracks which weights are eligible for updating based on recent activity. Every time step, the eligibility trace is updated according to the following rule:ei,a(t+1)=fiLCxaDLS+λei,a(t),[7]where λ is the trace-decay parameter, controlling for how long synapses stay eligible for updating. Eligibility traces enable faster learning by making it possible to update weights that were active in the recent past instead of only the very last time step (1).

Hippocampal System.

The hippocampal place-cell system was modeled as encoding the SR, following work by Stachenfeld et al. (51). The SR is a predictive representation employed in machine learning (11, 13, 97, 98), containing the discounted future occupancy of each state s′ from current state s (Eq. 2). In the hippocampal SR model, a row of the SR—i.e., Mπ(s,:)—constitutes the current population activity vector—i.e., the activity of every place cell in the current state. A column of Mπ contains the activity of a single place cell in all possible locations (states)—i.e., a rate map (SI Appendix, Fig. S1). In addition to the SR matrix, the agent will learn a vector with the expected reward R(s) for each states. The agent combines these to compute state value:VHPCπ(s)=∑s′Ms,s′Rs′.[8]The factorization of value into the SR and reward confers more flexible behavior because if one term changes, it can be relearned, while the other term remains intact (11). The agent used one-step lookahead to compute the value of each action Q(s,a), combining direct reward and the next state’s value:QHPC(st,at)=r(st)+γEst+1|st,atVHPC(st+1).[9]The SR satisfies a Bellman equation, meaning that any RL method can be used to learn the SR. Here, learning was achieved by using a TD update:ΔM^(st,s′)=αMδtM(s′),[10]where δtM(s′)=I(st=s′)+γM^(st+1,s′)−M^(st,s′) is a TD SPE pertaining to state s′ and αM is a learning rate. For the spatial-navigation studies modeled in this paper, animals were allowed to freely explore the environment without any reward before starting the task (23, 44). Hence, for these tasks, the SR was initialized as the SR associated to a random-walk policy MRW over a uniform spatial discretization of the environment. This was not the case for the task graphs of the two-step decision tasks (45). Therefore, in these tasks, we initialized the SR as the identity matrix I, encoding no other knowledge than the fact that every state predicts itself. Finally, the reward vector R^ was learned by using a simple delta rule:ΔR^(st)=αRrt−R^(st).[11]Although the SR is often introduced as above (in terms of discrete state counts), accurately estimating the SR for every state is infeasible in very large state spaces. This is known as the curse of dimensionality, and it necessitates the use of function approximation (1). The agent observes states through a vector of features f(s), which, if chosen rightly, will be of much smaller dimension than the number of states, allowing the agent to generalize to states that are nearby in feature space. The feature-based SR [also referred to as Successor Features (13)], rather than encoding the discounted number of state visits, encodes the expected discounted future activity of each feature:ψπ(s)=Eπ∑t=0∞γtf(st)|s0=s.[12]As in the tabular case, the feature-based SR can be used to compute value when multiplied with a vector of reward expectations per feature, u: Vπ(s)=ψπ(s)Tu. In the case of linear-function approximation, these Successor Features ψ in Eq. 12 are approximated by a linear function of the features f:ψ^(s)=WTf(s),[13]where W is a weight matrix which parameterizes the approximation. Intuitively, W encodes how much each feature predicts every other feature. As in the tabular case, TD learning can be used to update the SR weights (SI Appendix). Thus, at every state s (corresponding to a location) in the environment, the agent observed a population vector f(s) of BVC-driven place cells. It then computed its estimated Successor Features ψ using its current estimate of weights W and Eq. 13, which encode the discounted sum of future population firing-rate vectors f of the input place cells. In terms of circuitry, W might correspond to the Schaffer collaterals projecting from CA3 to CA1 neurons, corresponding to f and ψ, respectively.

In the context of HPC, the feature-based SR allows us to represent states as population vectors of place cells with overlapping firing fields (the features), rather than having a one-to-one correspondence between place cells and states. Then, we are free to model the dependence of the place cell firing on specific environmental features (boundaries). This dependence has been extensively characterized by computational models of BVCs (64, 65, 99⇓–101), which were shown to exist in the subiculum (66). Accordingly, we modeled a set of hippocampal place cells, whose activity fi(st) was the thresholded sum of a set of BVC inputs (see ref. 64 for details on how BVC and place-cell maps were calculated).

Crucially, modeling place cells as driven by BVCs allows us to explain the puzzling experimental finding by Doeller and Burgess (27) that learning to navigate to a location relative to a landmark, but not relative to a boundary, is sensitive to the blocking effect (61). In an accompanying neuroimaging paper, the authors showed that landmark learning was associated to BOLD activity in the dorsal striatum, whereas boundary-related navigation was associated to activity in the HPC (26).

Arbitration Process.

The agent has access to both its MF DLS component and its hippocampal component employing the SR. Both systems estimate the same value function, but might make different types of errors, and the agent has to arbitrate between them.

Rational arbitration should reflect the relative uncertainty (2), requiring the posterior distribution over values, rather than just the values themselves. Here, we used a convenient proxy for uncertainty, introduced by Wan Lee et al. (43)—namely, the recent average of prediction errors: the reward-prediction error for the MF component and the SPE for the SR component. If the SPE is low, this means that the SR system has a good estimate of the world. Similarly, if reward-prediction errors are low, this means the MF system has a reliable estimate of the value function. The reliability can be tracked by using a Pearce–Hall-like update rule (102), computing the recent average of absolute prediction errors Ω:ΔΩ=η(|δ|−Ω),[14]where |δ| is the absolute reward-prediction error and η is a learning rate. The reliability is defined as:χ=(δMAX−Ω)/δMAX,[15]with δMAX being the upper bound of the prediction error, which was set to one. Since in our model both systems are trained by a prediction error, we can apply this to both the MF and SR systems. Following Wan Lee et al. (43), we used the reliability measure for arbitration. These authors computed transition rates α and β for transitioning from MF to MB states, and vice versa, as follows. Here, we used the same terms, but for transitions between MF and SR. These transition rates are functions of the reliability of the respective systems:α(χMF)=Aα1+exp(BαχMF),[16]β(χSR)=Aβ1+exp(BβχSR),[17]where the A and B parameters in both equations determine the transition rate and the steepness of these curves, respectively. These parameters were fitted to behavioral data by Wan Lee et al. (43), and we matched their parameter values (SI Appendix, Table S1). At each time step, the rate of change of the proportion of influence of the SR system PSR was computed by using the following differential equation, generating a push–pull mechanism between HPC and DLS influence over behavior:dPSRdt=α(χMF)(1−PSR)−β(χSR)PSR.[18]Note that, consistent with behavioral data from human subjects (43), this arbitration mechanism resulted in a weighted influence of both systems in the final value estimates (Fig. 1), rather than a discrete choice. Note that the arbitrator combines the action values, not the actions. Thus, the agent will not end up with a midway action when the two systems encode different preferences. Lesions or partial inactivations of either the DLS or the HPC were achieved by setting limits on PSR (see SI Appendix for more details).

Code Availability.

The results were generated by using code written in Python. Code is available on ModelDB (accession no. 266836) (103).

Acknowledgments

We thank Dan Bush, Will de Cothi, Changmin Yu, and Kevin Miller for useful comments on the manuscript; Oliver Vikbladh and Maté Lengyel for discussions; and our anonymous reviewers for insightful suggestions. This work was supported by the European Union’s Horizon 2020 research and innovation program under Grant Agreement 785907 Human Brain Project SGA2; European Research Council Advanced Grant NEUROMEM; the Wellcome Trust; and the Gatsby Charitable Foundation.

Footnotes

  • ↵1J.P.G. and F.C. contributed equally to this work.

  • ↵2To whom correspondence may be addressed. Email: n.burgess{at}ucl.ac.uk.
  • Author contributions: J.P.G., F.C., K.L.S., and N.B. designed research; J.P.G. performed research; J.P.G. analyzed data; and J.P.G., F.C., K.L.S., and N.B. wrote the paper.

  • The authors declare no competing interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2007981117/-/DCSupplemental.

  • Copyright © 2020 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

References

  1. ↵
    1. R. S. Sutton,
    2. A. G. Barto
    , Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 1998), p. 1054.
  2. ↵
    1. N. D. Daw,
    2. Y. Niv,
    3. P. Dayan
    , Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
    OpenUrlCrossRefPubMed
  3. ↵
    1. E. C. Tolman
    , Cognitive maps in rats and man. Psychol. Rev. 55, 189–208 (1948).
    OpenUrlCrossRefPubMed
  4. ↵
    1. E. Tulving
    , Episodic and semantic memory. Organization of memory 1, 381–403 (1972).
    OpenUrl
  5. ↵
    1. D. L. Schacter,
    2. D. R. Addis,
    3. R. L. Buckner
    , Remembering the past to imagine the future: The prospective brain. Nat. Rev. Neurosci. 8, 657–661 (2007).
    OpenUrlCrossRefPubMed
  6. ↵
    1. A. Bicanski,
    2. N. Burgess
    , A neural-level model of spatial memory and imagery. eLife 7, e33752 (2018).
    OpenUrlCrossRefPubMed
  7. ↵
    1. A. H. Black,
    2. W. F. Prokasy
    1. R. A. Rescorla,
    2. A. R. Wagner
    , “A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement” in Classical Conditioning II: Current Research and Theory, A. H. Black, W. F. Prokasy, eds. (Appleton-Century-Crofts, New York, NY, 1972), vol. 2, pp. 64–99.
    OpenUrl
  8. ↵
    1. R. S. Sutton
    , Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).
    OpenUrlCrossRef
  9. ↵
    1. L. R. Squire,
    2. S. Zola-Morgan
    , The medial temporal lobe memory system. Science 253, 1380–1386 (1991).
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. P. R. Montague,
    2. P. Dayan,
    3. T. J. Sejnowski
    , A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
    OpenUrlAbstract/FREE Full Text
  11. ↵
    1. P. Dayan
    , Improving generalisation for temporal difference learning: The successor representation. Neural Comput. 5, 613–624 (1993).
    OpenUrlCrossRef
  12. ↵
    1. L. Lehnert,
    2. M. L. Littman
    , Transfer with model features in reinforcement learning. arXiv:1807.01736 (4 July 2018).
  13. ↵
    1. A. Barreto,
    2. R. Munos,
    3. T. Schaul,
    4. D. Silver
    , Successor features for transfer in reinforcement learning. arXiv:1606.05312 (16 June 2016).
  14. ↵
    1. J. O’Keefe,
    2. L. Nadel
    , The Hippocampus as a Cognitive Map (Clarendon Press, Oxford, UK, 1978).
  15. ↵
    1. F. Chersi,
    2. N. Burgess
    , The cognitive architecture of spatial navigation: Hippocampal and striatal contributions. Neuron 88, 64–77 (2015).
    OpenUrl
  16. ↵
    1. NM. White
    , The role of stimulus ambiguity and movement in spatial navigation: A multiple memory systems analysis of location discrimination. Neurobiol. Learn. Mem. 82, 216–229 (2004).
    OpenUrlCrossRefPubMed
  17. ↵
    1. J. O’Keefe,
    2. J. Dostrovsky
    , The hippocampus as a spatial map: Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).
    OpenUrlCrossRefPubMed
  18. ↵
    1. J. S. Taube,
    2. R. U. Muller,
    3. J. B. Ranck
    , Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis. J. Neurosci. 10, 420–435 (1990).
    OpenUrlAbstract/FREE Full Text
  19. ↵
    1. T. Hafting,
    2. M. Fyhn,
    3. S. Molden,
    4. M. Moser,
    5. E. I. Moser
    , Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801–806 (2005).
    OpenUrlCrossRefPubMed
  20. ↵
    1. R. Poldrack,
    2. M. Packard
    , Competition among multiple memory systems: Converging evidence from animal and human brain studies. Neuropsychologia 41, 245–251 (2003).
    OpenUrlCrossRefPubMed
  21. ↵
    1. H. H. Yin,
    2. S. B. Ostlund,
    3. B. J. Knowlton,
    4. B. W. Balleine
    , The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).
    OpenUrlCrossRefPubMed
  22. ↵
    1. H. H. Yin,
    2. B. J. Knowlton,
    3. B. W. Balleine
    , Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181–189 (2004).
    OpenUrlCrossRefPubMed
  23. ↵
    1. M. G. Packard,
    2. J. L. McGaugh
    , Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol. Learn. Mem. 72, 65–72 (1996).
    OpenUrl
  24. ↵
    1. M. G. Packard
    , Glutamate infused posttraining into the hippocampus or caudate-putamen differentially strengthens place and response learning. Proc. Natl. Acad. Sci. U.S.A. 96, 12881–12886 (1999).
    OpenUrlAbstract/FREE Full Text
  25. ↵
    1. R. J. Mcdonald,
    2. N. M. White
    , Parallel information processing in the water maze : Evidence for independent memory systems involving dorsal striatum and hippocampus. Behav. Neural. Biol. 270, 260–270 (1994).
    OpenUrl
  26. ↵
    1. C. F. Doeller,
    2. J. A. King,
    3. N. Burgess
    , Parallel striatal and hippocampal systems for landmarks and boundaries in spatial memory. Proc. Natl. Acad. Sci. U.S.A. 105, 5915–5920 (2008).
    OpenUrlAbstract/FREE Full Text
  27. ↵
    1. C. F. Doeller,
    2. N. Burgess
    , Distinct error-correcting and incidental learning of location relative to landmarks and boundaries. Proc. Natl. Acad. Sci. U.S.A. 105, 5909–5914 (2008).
    OpenUrlAbstract/FREE Full Text
  28. ↵
    1. K. J. Miller,
    2. M. M. Botvinick,
    3. C. D. Brody
    , Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 20, 1269–1276 (2017).
    OpenUrlCrossRefPubMed
  29. ↵
    1. O. M. Vikbladh et al.
    , Hippocampal contributions to model-based planning and spatial memory. Neuron 102, 683–693.e4 (2019).
    OpenUrl
  30. ↵
    1. D. P. Kimble,
    2. R. BreMiller
    , Latent learning in hippocampal-lesioned rats. Physiol. Behav. 26, 1055–1059 (1981).
    OpenUrlPubMed
  31. ↵
    1. D. P. Kimble,
    2. W. P. Jordan,
    3. R. BreMiller
    , Further evidence for latent learning in hippocampal-lesioned rats. Physiol. Behav. 29, 401–407 (1982).
    OpenUrlPubMed
  32. ↵
    1. L. H. Corbit,
    2. B. W. Balleine
    , The role of the hippocampus in instrumental conditioning. J. Neurosci. 20, 4233–4239 (2000).
    OpenUrlAbstract/FREE Full Text
  33. ↵
    1. L. H. Corbit,
    2. S. B. Ostlund,
    3. B. W. Balleine
    , Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. J. Neurosci. 22, 10976–10984 (2002).
    OpenUrlAbstract/FREE Full Text
  34. ↵
    1. J. Ward-Robinson et al.
    , Excitotoxic lesions of the hippocampus leave sensory preconditioning intact: Implications for models of hippocampal functioning. Behav. Neurosci. 115, 1357–1362 (2001).
    OpenUrlCrossRefPubMed
  35. ↵
    1. S. Gaskin,
    2. S. Chai,
    3. NM. White
    , Inactivation of the dorsal hippocampus does not affect learning during exploration of a novel environment. Hippocampus 15, 1085–1093 (2005).
    OpenUrlCrossRefPubMed
  36. ↵
    1. W. B. Scoville,
    2. B. Milner
    , Loss of recent memory after bilateral hippocampal lesions. J Neurol. Neurosurg. Psychiatry. 20, 11–21 (1957).
    OpenUrlFREE Full Text
  37. ↵
    1. J. A. Dusek,
    2. H. Eichenbaum
    , The hippocampus and memory for orderly stimulus relations. Proc. Natl. Acad. Sci. U.S.A. 94, 7109–7114 (1997).
    OpenUrlAbstract/FREE Full Text
  38. ↵
    1. L. M. DeVito,
    2. H. Eichenbaum
    , Memory for the order of events in specific sequences: Contributions of the hippocampus and medial prefrontal cortex. J. Neurosci. 31, 3169–3175 (2011).
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. M. Bunsey,
    2. H. Eichenbaum
    , Conservation of hippocampal memory function in rats and humans. Nature 379, 255–257 (1996).
    OpenUrlCrossRefPubMed
  40. ↵
    1. A. C. Schapiro,
    2. N. B. Turk-Browne,
    3. K. A. Norman,
    4. M. M. Botvinick
    , Statistical learning of temporal community structure in the hippocampus. Hippocampus 26, 3–8 (2016).
    OpenUrlCrossRefPubMed
  41. ↵
    1. M. M. Garvert,
    2. R. J. Dolan,
    3. T. E. Behrens
    , A map of abstract relational knowledge in the human hippocampal-entorhinal cortex. eLife 6, e17086 (2017).
    OpenUrlCrossRefPubMed
  42. ↵
    1. F. Vargha-Khadem et al.
    , Differential effects of early hippocampal pathology on episodic and semantic memory. Science 277, 376–380 (1997).
    OpenUrlAbstract/FREE Full Text
  43. ↵
    1. S. Wan Lee,
    2. S. Shimojo,
    3. J. P. O’Doherty
    , Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
    OpenUrlCrossRefPubMed
  44. ↵
    1. J. M. Pearce,
    2. A. D. L. Roberts,
    3. M. Good
    , Hippocampal lesions disrupt navigation based on cognitive maps but not heading vectors. Nature 62, 1997–1999 (1998).
    OpenUrl
  45. ↵
    1. B. B. Doll,
    2. K. D. Duncan,
    3. D. A. Simon,
    4. D. Shohamy,
    5. ND. Daw
    , Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).
    OpenUrlCrossRefPubMed
  46. ↵
    1. N. D. Daw,
    2. S. J. Gershman,
    3. B. Seymour,
    4. P. Dayan,
    5. R. J. Dolan
    , Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
    OpenUrlCrossRefPubMed
  47. ↵
    1. F. Chersi,
    2. N. Burgess
    , “Hippocampal and striatal involvement in cognitive tasks : A computational model” in Proceedings of the 6th International Conference on Memory ICOM16 (2016), pp. 24–28.
  48. ↵
    1. L. Dollé,
    2. D. Sheynikhovich,
    3. B. Girard,
    4. R. Chavarriaga,
    5. A. Guillot
    , Path planning versus cue responding: A bio-inspired model of switching between navigation strategies. Biol. Cybern. 103, 299–317 (2010).
    OpenUrlPubMed
  49. ↵
    1. L. Dollé,
    2. R. Chavarriaga,
    3. A. Guillot,
    4. M. Khamassi
    , Interactions of spatial strategies producing generalization gradient and blocking: A computational approach. PLoS Comput. Biol. 14, e1006092 (2018).
    OpenUrl
  50. ↵
    1. C. J. C. H. Watkins,
    2. P. Dayan
    , Q-learning. Mach. Learn. 8, 279–292 (1992).
    OpenUrlCrossRef
  51. ↵
    1. K. L. Stachenfeld,
    2. M. M. Botvinick,
    3. S. J. Gershman
    , The hippocampus as a predictive map. Nat. Neurosci. 20, 1643–1653 (2017).
    OpenUrlCrossRefPubMed
  52. ↵
    1. S. Killcross,
    2. E. Coutureau
    , Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13, 400–408 (2003).
    OpenUrlCrossRefPubMed
  53. ↵
    1. B. D. Devan,
    2. N. M. White
    , Parallel information processing in the dorsal striatum: Relation to hippocampal function. J. Neurosci. 19, 2789–2798 (1999).
    OpenUrlAbstract/FREE Full Text
  54. ↵
    1. A. A. Braun et al.
    , Dopamine depletion in either the dorsomedial or dorsolateral striatum impairs egocentric Cincinnati water maze performance while sparing allocentric Morris water maze learning. Neurobiol. Learn. Mem. 118, 55–63 (2015).
    OpenUrl
  55. ↵
    1. E. Miyoshi et al.
    , Both the dorsal hippocampus and the dorsolateral striatum are needed for rat navigation in the Morris water maze. Behav. Brain Res. 226, 171–178 (2012).
    OpenUrlCrossRefPubMed
  56. ↵
    1. Y. Kosaki,
    2. J. M. Pearce,
    3. A. McGregor
    , The response strategy and the place strategy in a plus-maze have different sensitivities to devaluation of expected outcome. Hippocampus 28, 484–496 (2018).
    OpenUrlCrossRefPubMed
  57. ↵
    1. C. D. Adams,
    2. A. Dickinson
    , Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).
    OpenUrlCrossRef
  58. ↵
    1. P. Dayan,
    2. K. C. Berridge
    , Model-based and model-free Pavlovian reward learning: Revaluation, revision, and revelation. Cognit. Affect Behav. Neurosci. 14, 473–492 (2014).
    OpenUrlCrossRefPubMed
  59. ↵
    1. E. De Leonibus et al.
    , Cognitive and neural determinants of response strategy in the dual-solution plus-maze task. Learn. Mem. 18, 241–244 (2011).
    OpenUrlAbstract/FREE Full Text
  60. ↵
    1. M. P. H. Gardner,
    2. G. Schoenbaum,
    3. S. J. Gershman
    , Rethinking dopamine as generalized prediction error. Proc. Biol. Sci. 285, 20181645 (2018).
    OpenUrlCrossRefPubMed
  61. ↵
    1. B. A. Campbell,
    2. R. M. Church
    1. L. J. Kamin
    , “Predictability, surprise, attention, and conditioning” in Punishment and Aversive Behavior, B. A. Campbell, R. M. Church, Eds. (Appleton-Century-Crofts, New York, 1969), pp. 279–296.
  62. ↵
    1. J. O’Keefe,
    2. N. Burgess
    , Geometric determinants of the place fields of hippocampal neurons. Nature 381, 425–428 (1996).
    OpenUrlCrossRefPubMed
  63. ↵
    1. A. Cressant,
    2. R. U. Muller,
    3. B. Poucet
    , Failure of centrally placed objects to control the firing fields of hippocampal place cells. J. Neurosci. 17, 2531–2542 (1997).
    OpenUrlAbstract/FREE Full Text
  64. ↵
    1. C. Barry et al.
    , The boundary vector cell model of place cell firing and spatial memory. Rev. Neurosci. 17, 71–98 (2006).
    OpenUrlCrossRefPubMed
  65. ↵
    1. T. Hartley,
    2. N. Burgess,
    3. C. Lever,
    4. F. Cacucci,
    5. J. O’Keefe
    , Modeling place fields in terms of the cortical inputs to the hippocampus. Hippocampus 10, 369–379 (2000).
    OpenUrlCrossRefPubMed
  66. ↵
    1. C. Lever,
    2. S. Burton,
    3. A. Jeewajee,
    4. J. O’Keefe,
    5. N. Burgess
    , Boundary vector cells in the subiculum of the hippocampal formation. J. Neurosci. 29, 9771–9777 (2009).
    OpenUrlAbstract/FREE Full Text
  67. ↵
    1. A. Bicanski,
    2. N. Burgess
    , Neuronal vector coding in spatial cognition. Nat. Rev. Neurosci. 21, 453–470 (2020).
    OpenUrl
  68. ↵
    1. T. Akam,
    2. R. Costa,
    3. P. Dayan
    , Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11, e1004648 (2015).
    OpenUrlCrossRefPubMed
  69. ↵
    1. H. Eichenbaum,
    2. T. Otto,
    3. N. J. Cohen
    , The hippocampus: What does it do?. Behav. Neural. Biol. 57, 2–36 (1992).
    OpenUrlCrossRefPubMed
  70. ↵
    1. L. A. Bradfield,
    2. B. K. Leung,
    3. S. Boldt,
    4. S. Liang,
    5. B. W. Balleine
    , Goal-directed actions transiently depend on dorsal hippocampus. Nat. Neurosci. 23, 1194–1197 (2020).
    OpenUrl
  71. ↵
    1. S. J. Gershman,
    2. C. D. Moore,
    3. M. T. Todd,
    4. K. A. Norman,
    5. P. B. Sederberg
    , The successor representation and temporal context. Neural. Comput. 24, 1553–1568 (2012).
    OpenUrlCrossRefPubMed
  72. ↵
    1. M. A. A. van der Meer,
    2. A. Johnson,
    3. N. C. Schmitzer-Torbert
    4. A. D. Redish
    , Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron 67, 25–32 (2010).
    OpenUrlCrossRefPubMed
  73. ↵
    1. N. C. Schmitzer-Torbert,
    2. A. D. Redish
    , Task-dependent encoding of space and events by striatal neurons is dependent on neural subtype. Neuroscience 153, 349–360 (2008).
    OpenUrlCrossRefPubMed
  74. ↵
    1. J. D. Berke,
    2. J. T. Breck,
    3. H. Eichenbaum
    , Striatal versus hippocampal representations during win-stay maze performance. J. Neurophysiol. 101, 1575–1587 (2009).
    OpenUrlCrossRefPubMed
  75. ↵
    1. H. H. Yin,
    2. B. J. Knowlton
    , The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7, 464–476 (2006).
    OpenUrlCrossRefPubMed
  76. ↵
    1. H. H. Yin,
    2. B. J. Knowlton
    , Contributions of striatal subregions to place and response learning. Learn. Mem. 11, 459–463 (2004).
    OpenUrlAbstract/FREE Full Text
  77. ↵
    1. B. D. Devan,
    2. R. J. McDonald,
    3. N. M. White
    , Effects of medial and lateral caudate-putamen lesions on place- and cue-guided behaviors in the water maze: Relation to thigmotaxis. Behav. Brain Res. 100, 5–14 (1999).
    OpenUrlCrossRefPubMed
  78. ↵
    1. E. Tabuchi,
    2. A. B. Mulder,
    3. S. I. Wiener
    , Neurons in hippocampal afferent zones of rat striatum parse routes into multi-pace segments during maze navigation. Eur. J. Neurosci. 19, 1923–1932 (2004).
    OpenUrlCrossRefPubMed
  79. ↵
    1. K. Ragozzino,
    2. S. Leutgeb,
    3. S. Mizumori
    , Dorsal striatal head direction and hippocampal place representations during spatial navigation. Exp. Brain Res. 139, 372–376 (2001).
    OpenUrlCrossRefPubMed
  80. ↵
    1. D. J. Foster,
    2. R. G. Morris,
    3. P. Dayan
    , A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000).
    OpenUrlCrossRefPubMed
  81. ↵
    1. N. J. Gustafson,
    2. N. D. Daw
    , Grid cells, place cells, and geodesic generalization for spatial reinforcement learning. PLoS Comput. Biol. 7, e1002235 (2011).
    OpenUrlCrossRefPubMed
  82. ↵
    1. J. L. S. Bellmund et al.
    , Deforming the metric of cognitive maps distorts memory. Nat. Hum. Behav. 4, 177–188 (2019).
    OpenUrl
  83. ↵
    1. I. Momennejad et al.
    , The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680–692 (2017).
    OpenUrl
  84. ↵
    1. E. M. Russek,
    2. I. Momennejad,
    3. M. M. Botvinick,
    4. S. J. Gershman
    , Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).
    OpenUrlCrossRefPubMed
  85. ↵
    1. T. Evans,
    2. N. Burgess
    , Coordinated hippocampal-entorhinal replay as structural inference, Adv. Neural Information Processing Systems 32, 1729–1741 (2019).
    OpenUrl
  86. ↵
    1. J. P. Geerts,
    2. K. L. Stachenfeld,
    3. N. Burgess
    , “Probabilistic successor representations with Kalman temporal differences” in Conference on Computational Cognitive Neuroscience (2019).
  87. ↵
    1. L. Lehnert,
    2. S. Tellex,
    3. M. L. Littman
    , Advantages and limitations of using successor features for transfer in reinforcement learning. arXiv:1708.00102 (31 July 2017).
  88. ↵
    1. P. Piray,
    2. ND. Daw
    , A common model explaining flexible decision making, grid fields and cognitive control. bioRxiv: 856849 (10 December 2019).
  89. ↵
    1. T. J. Madarasz,
    2. T. E. Behrens
    , Better transfer learning with inferred successor maps. Adv. Neural Inf. Process. Syst. arXiv:1906.07663 (18 June 2019).
  90. ↵
    1. R. C. Wilson,
    2. Y. K. Takahashi,
    3. G. Schoenbaum,
    4. Y. Niv
    , Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–278 (2014).
    OpenUrlCrossRefPubMed
  91. ↵
    1. L. A. Bradfield,
    2. A. Dezfouli,
    3. M. Van Holstein,
    4. B. Chieng,
    5. B. W. Balleine
    , Medial orbitofrontal cortex mediates outcome retrieval in partially observable task situations. Neuron 88, 1268–1280 (2015).
    OpenUrlCrossRefPubMed
  92. ↵
    1. N. W. Schuck,
    2. M. B. Cai,
    3. R. C. Wilson,
    4. Y. Niv
    , Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
    OpenUrl
  93. ↵
    1. E. Vertes,
    2. M. Sahani
    , A neurally plausible model learns successor representations in partially observable environments. arXiv:1906.09480 (22 June 2019).
  94. ↵
    1. N. Burgess,
    2. T. Hartley
    , Orientational and geometric determinants of place and head-direction, Adv. Neural Information Processing Systems 14, 165–172 (2002).
    OpenUrl
  95. ↵
    1. M. Franzius,
    2. H. Sprekeler,
    3. L. Wiskott
    , Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Comput. Biol. 3, e166 (2007).
    OpenUrlCrossRefPubMed
  96. ↵
    1. J. R. Hinman,
    2. G. W. Chapman,
    3. M. E. Hasselmo
    , Neuronal representation of environmental boundaries in egocentric coordinates. Nat. Commun. 10, 2772 (2019).
    OpenUrlCrossRef
  97. ↵
    1. A. Barreto,
    2. S. Hou,
    3. D. Borsa,
    4. D. Silver,
    5. D. Precup
    , Fast reinforcement learning with generalized policy updates. Proc. Natl. Acad. Sci. U.S.A., 10.1073/pnas.1907370117 (2020).
  98. ↵
    1. T. D. Kulkarni,
    2. A. Saeedi,
    3. S. Gautam,
    4. S. J. Gershman
    , Deep successor reinforcement learning. arXiv:1606.02396 (8 June 2016).
  99. ↵
    1. N. Burgess,
    2. A. Jackson,
    3. T. Hartley,
    4. J. O’Keefe
    , Predictions derived from modeling the hippocampal role in navigation. Biol. Cybern. 83, 301–312 (2000).
    OpenUrlCrossRefPubMed
  100. ↵
    1. R. M. Grieves,
    2. É. Duvelle,
    3. P. A. Dudchenko
    , A boundary vector cell model of place field repetition. Spatial Cognit. Comput. 18, 217–256 (2018).
    OpenUrl
  101. ↵
    1. W. de Cothi,
    2. C. Barry
    , Neurobiological successor features for spatial navigation. Hippocampus, doi:10.1002/hipo.23246 (2020).
    OpenUrlCrossRef
  102. ↵
    1. J. M. Pearce,
    2. G. Hall
    , A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
    OpenUrlCrossRefPubMed
  103. ↵
    1. R. A. McDougal et al.
    , Twenty years of ModelDB and beyond: Building essential modeling tools for the future of neuroscience. J. Comput. Neurosci. 42, 1–10 (2017).
    OpenUrlCrossRefPubMed
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
A general model of hippocampal and dorsal striatal learning and decision making
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
A general model of hippocampal and dorsal striatal learning and decision making
Jesse P. Geerts, Fabian Chersi, Kimberly L. Stachenfeld, Neil Burgess
Proceedings of the National Academy of Sciences Dec 2020, 117 (49) 31427-31437; DOI: 10.1073/pnas.2007981117

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
A general model of hippocampal and dorsal striatal learning and decision making
Jesse P. Geerts, Fabian Chersi, Kimberly L. Stachenfeld, Neil Burgess
Proceedings of the National Academy of Sciences Dec 2020, 117 (49) 31427-31437; DOI: 10.1073/pnas.2007981117
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley

Article Classifications

  • Biological Sciences
  • Neuroscience
  • Physical Sciences
  • Biophysics and Computational Biology
Proceedings of the National Academy of Sciences: 117 (49)
Table of Contents

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Setting sun over a sun-baked dirt landscape
Core Concept: Popular integrated assessment climate policy models have key caveats
Better explicating the strengths and shortcomings of these models will help refine projections and improve transparency in the years ahead.
Image credit: Witsawat.S.
Model of the Amazon forest
News Feature: A sea in the Amazon
Did the Caribbean sweep into the western Amazon millions of years ago, shaping the region’s rich biodiversity?
Image credit: Tacio Cordeiro Bicudo (University of São Paulo, São Paulo, Brazil), Victor Sacek (University of São Paulo, São Paulo, Brazil), and Lucy Reading-Ikkanda (artist).
Syrian archaeological site
Journal Club: In Mesopotamia, early cities may have faltered before climate-driven collapse
Settlements 4,200 years ago may have suffered from overpopulation before drought and lower temperatures ultimately made them unsustainable.
Image credit: Andrea Ricci.
Click beetle on a leaf
How click beetles jump
Marianne Alleyna, Aimy Wissa, and Ophelia Bolmin explain how the click beetle amplifies power to pull off its signature jump.
Listen
Past PodcastsSubscribe
Birds nestling on tree branches
Parent–offspring conflict in songbird fledging
Some songbird parents might improve their own fitness by manipulating their offspring into leaving the nest early, at the cost of fledgling survival, a study finds.
Image credit: Gil Eckrich (photographer).

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490