Learning not to remember: How predicting the future impairs encoding of the present

Memory enables reminiscence about past experiences and guides processing of future experiences. However, these two functions are inherently at odds: remembering specific past experiences requires storing idiosyncratic properties, but by definition such properties may not be shared with similar situations in the future and thus are not as useful for prediction. We discovered that, when faced with this conflict, the brain prioritizes prediction over encoding. Behavioral tasks showed that pictures allowing for prediction of what will appear next based on learned regularities are poorly encoded into memory. Brain imaging revealed that predictive representations in the hippocampus may be responsible for this worse episodic encoding and suggested that such interference may result from competition between hippocampal pathways. This tradeoff between statistical learning and episodic memory may be adaptive, focusing encoding on experiences for which we do not yet have a predictive model.


Introduction
Human memory contains two fundamentally different kinds of information -episodic and statistical. Episodic memory refers to the encoding of specific details of individual experiences (e.g., what happened on your last birthday), strating this role for statistical learning in episodic memory behaviorally, we identify an underlying mechanism in the brain using fMRI, based on the recent discovery that both processes depend on the hippocampus and thus compete to determine its representations and output (Schapiro et al., 2017).
We exposed human participants to a stream of pictures and later tested 40 their memory ( Figure 1A). The pictures consisted of outdoor scenes from 12 different categories (e.g., beach, mountain, farm). Some of the categories (type A) were predictive of which category appeared next (type B), whereas other categories (type X) were non-predictive. That is, every time participants saw a picture from an A category, they always saw a picture from a specific B category 45 next; however, when a picture from an X category appeared, it was followed by a picture from one of several other categories ( Figure 1B). Participants were not informed about these predictive A → B category relationships and learned them incidentally through exposure (Brady and Oliva, 2008). Although each category was shown several times, every individual picture in the stream was a 50 novel exemplar from the category and only shown once. For example, whenever a picture from the beach category appeared, it was a new beach that they had not seen before. After the stream, we tested memory for these individual pictures amongst new exemplars from the same categories. The key question was whether memory for the exemplars from the predictive categories would be 55 remembered worse than those of non-predictive categories. This was first tested in two behavioral experiments.

Results
While viewing the stream, the 30 participants in Experiment 1 performed a cover task in which they judged whether or not there was a manmade object 60 in the scene. Response times (RTs) on this task were used to assess whether participants had learned the predictive relationships between A and B categories. If so, we expected that responses for B categories would become faster over time in the experiment, as participants became able to anticipate which category would appear. Providing evidence of learning, there was a significant interaction between condition (A, B, X) and time (1st, 2nd, 3rd, 4th quartile of the stream) (F(6, 174) = 2.27, p = 0.039). This interaction reflected a pattern of growing facilitation for pictures from the B category, relative to both X and A categories whose appearance in the stream could not be anticipated ( Figure   S1A). 70 Given our hypothesis, we expected worse encoding of the exemplars from the predictive A categories, leading to greater forgetting in a later memory test.
Indeed, there was a main effect of condition (F(2,58) = 4.75, p = 0.012), with a lower hit rate for pictures from the A categories relative to both B (t(29) = -2.79, p = 0.009) and X (t(29) = -2.33, p = 0.027) categories ( Figure 1C). 75 There was no difference in hit rate between B and X categories (t(29) = 1.19, p = 0.243), showing that the memory deficit is selective to whether a category was predictive (A vs. X), not whether it was predictable (B vs. X).
Because the predictive A → B relationships had to be learned during the stream, we expected that the memory deficit for pictures from the A cate-80 gory would emerge over time. Indeed, this effect was evident only in the third and fourth quartiles of stream exposure ( Figure S1B). Furthermore, the overall memory deficit reflected a failure to encode specific A exemplars rather than a generic impairment for A categories (De Brigard et al., 2017;Smith et al., 2013), as the false alarm rate for new exemplars from each category at test did 85 not differ by condition (F(2, 58) = 0.29, p = 0.751).
This experiment demonstrated that episodic encoding is worse for predictive vs. non-predictive pictures using a surprise recognition memory test. We interpret this result as evidence of competition between prediction and encoding in the hippocampus. However, recognition tests do not selectively probe aspects 90 of episodic memory that critically depend on the hippocampus. Participants could have relied upon a generic sense of familiarity with the pictures, which can be supported by cortical areas (Brown and Aggleton, 2001;Davachi et al., 2003;Norman and O'Reilly, 2003).
We thus designed Experiment 2 with a different, recall-based memory test.

95
This test required retrieval of specific spatiotemporal details, which is known to tax the hippocampus (Miller et al., 2013) and is a hallmark function of episodic memory (e.g., remembering who arrived first at a birthday party). A new group of 30 participants was exposed to the same kind of picture stream and performed the same cover task as in Experiment 1. However, during the memory test phase 100 they were unexpectedly asked to indicate at what exact time (on the clock) they had seen each picture during the stream. As before, encoding of the time was incidental as they were not informed in advance that they would be tested.
This kind of precise temporal source memory requires the retrieval of details about the context in which each picture was encoded, which depends on the 105 hippocampus (Davachi and DuBrow, 2015;Mitchell and Johnson, 2009).
We analyzed the accuracy of the time reports in terms of absolute deviation from the correct time on the clock at encoding, such that higher values indicate less precise memory ( Figure 1D). Consistent with the results of Experiment 1, there was a main effect of condition (A, B, X) on temporal deviation (F(2,58) 110 = 3.17, p = 0.049). Pictures from the A categories had greater deviation (less precision) than those from the X categories (t(29) = 2.26, p = 0.031); the same pattern was present for A vs. B categories but did not reach significance (t(29) = 1.45, p = 0.157), nor did B and X categories differ (t(29) = 1.23, p = 0.229).
What explains the reduced encoding of predictive pictures in Experiments 115 1 and 2? We propose that this results from the co-dependence of statistical learning and episodic memory on the same brain region -the hippocampus (Schapiro et al., 2017). Specifically, we hypothesized that the appearance of a picture from an A category triggers the retrieval and predictive representation of the corresponding B category in the hippocampus. This in turn prevents the 120 hippocampus from forming and encoding a new representation of the specific details of that particular A picture that would be needed for later recall from episodic memory.
To test this hypothesis, Experiment 3 employed high-resolution fMRI in 36 new participants who viewed the same kind of picture stream as in Experiments 125 1 and 2. We used a multivariate pattern classification approach from machine learning (Cohen et al., 2017), which quantified neural prediction of B categories during the encoding of A pictures. Classification models were trained for each category based on patterns of fMRI activity in a separate phase of the experiment where participants were shown pictures from all categories in a random 130 order. These classifiers were then tested during viewing of the stream containing category pairs, providing a continuous readout of neural evidence for each category. We performed this analysis based on fMRI activity patterns from the hippocampus, our primary region of the interest (ROI), as well as from control ROIs in occipital and parahippocampal cortices (Figure 2, bottom). These con-135 trol ROIs were chosen because we expected them to be sensitive to the category of the current picture being viewed but not to predict the upcoming B category given an A picture.
To validate our approach, we first trained and tested classifiers on the viewing of pictures from the A ("Perception of A") and B ("Perception of B") categories  To further assess the specificity of these results to the hippocampus, we ran an exploratory whole-brain searchlight analysis. We again validated our 160 approach by decoding the perception of B categories (train on B, test on B).
The resulting regions, which represented scene categories, were largely consistent with our a priori ROIs ( Figure S4B). Notably, the prediction of B (train on B, test on A) produced no significant clusters across the brain after correcting for multiple comparisons ( Figure S4A), consistent with this effect being specific to 165 the hippocampal ROI.
After having demonstrated prediction from statistical learning in the hippocampus, we tested our critical hypothesis that this prediction would impair simultaneous encoding into episodic memory. We quantified this brain-behavior relationship by correlating (1) each participant's decoding accuracy for predic-170 tion of B during A in the hippocampus with (2) their difference in hit rate for A vs. X categories (the key behavioral effect in Experiments 1-2) in the memory test ( Figure 2, right). We limited this analysis to participants with decoding accuracy above chance (0.5), as variance at or below chance cannot be interpreted (results were robust to this exclusion, see Supplementary Materials). Consistent 175 with our hypothesis, there was a negative correlation: more accurate prediction of B categories in the hippocampus in response to A pictures was associated with a greater deficit in memory for the A pictures (r(22) = -0.63, p <0.001).
How is this interaction between prediction and encoding implemented in the circuitry of the hippocampus? A recent biologically-plausible neural network 180 model of the hippocampus (Schapiro et al., 2017) suggests that episodic memory and statistical learning depend on different pathways, the trisynaptic pathway (TSP) and monosynaptic pathway (MSP), respectively ( Figure 3B). The TSP consists of a connection between entorhinal cortex (EC) and the CA1 subfield via intermediate connections through the dentate gyrus (DG) and CA3 subfield.

185
DG and CA3 have sparse activity because of high lateral inhibition; this allows for the distinct representation of similar experiences (i.e., pattern separation; (Leutgeb et al., 2007)), which is needed to avoid interference between episodic memories. The MSP consists of a direct recurrent connection between EC and CA1. CA1 has lower inhibition and thus higher overall activity and less sparsity; 190 this leads to overlap in the representation of similar experiences, which allows their regularities to be reinforced.
Accordingly, both the TSP and MSP converge on CA1, which we propose is the locus of conflict between episodic memory and statistical learning. We aimed to test this theory with the fMRI data from Experiment 3 using a functional 195 connectivity approach known as psychophysiological interaction (PPI). We reasoned CA1 would interact more with TSP (reflected in correlated activity with a combined CA2/3/DG ROI) during episodic memory and with MSP (reflected in correlated activity with an EC ROI) during statistical learning ( Figure 3C).
Specifically, we hypothesized that during periods of low episodic encoding and 200 high statistical learning, CA1 should be more functionally connected with with EC than CA2/3/DG. We quantified episodic encoding using a non-parametric measure of recognition memory fidelity (A ) across pictures from both A and B categories (together, referred to as "Structured") and for pictures from X categories ("Random"). We 205 combined A and B categories because these pictures provided the opportunity for both the extraction of regularities and the encoding of individual episodes.
Higher A values indicate more episodic encoding and lower A values indicate less episodic encoding (and perhaps thus more statistical learning). We separately correlated EC ↔ CA1 and CA2/3/DG ↔ CA1 connectivity from the 210 PPI analysis with A for Structured and Random pictures ( Figure 3D).

215
For Random pictures, CA1 connectivity with both pathways was unrelated to A (EC: r(34) = -0.039, p = 0.82; CA2/3/DG: r(34) = 0.15, p = 0.37). The lack of any positive relationship of CA2/3/DG ↔ CA1 connectivity with A suggests that statistical learning is necessary to reveal an impact of CA1 connectivity (with EC) on episodic memory. Namely, greater MSP engagement in response 220 to regularities reflects a bias away from episodic memory.

Discussion
To summarize the three experiments, our core findings were (1) that prediction from statistical learning interferes with encoding into episodic memory and (2) that this competition may be explained by the multiplexed function of 225 the hippocampus across convergent pathways. These findings are related to two theoretical issues in the learning and memory literature.
First, the hippocampus is necessary for both memory encoding and retrieval, and yet these functions are fundamentally at odds. Given a partial match between the current experience and past experiences, encoding leverages pattern 230 separation to store a new trace of the current experience, whereas retrieval invokes pattern completion to access old traces of those past experiences. To resolve this incompatibility, it has been argued that the hippocampus toggles between encoding and retrieval states (Hasselmo et al., 1996;Duncan et al., 2012;Patil and Duncan, 2018). In the present study, if seeing a picture from an 235 A category triggers the retrieval of its associated B category, the hippocampus may be pushed into a retrieval state that suppresses memory encoding.
Second, after learning predictive relationships in classical conditioning, "blocking" can occur when new cues are introduced. After one conditioned stimulus (CS1) has been paired with an unconditioned stimulus (US), no associative 240 learning occurs when a second conditioned stimulus (CS2) is added (Kamin, 1969). This is interpreted as CS2 being redundant with CS1, that is, not providing additional predictive value given that the US can be fully explained by CS1. In the present study, the A pictures contain two kinds of features: those that are diagnostic of the category (e.g., sand and water for a beach) and those 245 that are idiosyncratic to each exemplar (e.g., particular people, umbrellas, boats, etc.). If categorical features are sufficient to predict the upcoming B category, idiosyncratic features may not be attended or represented (Mackintosh, 1975;Kruschke, 2001), impeding the formation of episodic memory. Our findings are not fully consistent with this account, however. Blocking might predict that the 250 A pictures are represented more categorically and this enables prediction of the B category; yet, we found a trade-off in the hippocampus between perceptual evidence of the A category and predictive evidence of the B category.
Stepping back, why are the computationally opposing functions of episodic memory and statistical learning housed together in the hippocampus? We pro-255 pose that this shared reliance allows them to regulate each other. By analogy, using your right foot to operate both the brake and gas pedals in a car serves as an anatomical constraint that forces you to either accelerate or decelerate, but not both at the same time. A similarly adaptive constraint may be present in the hippocampus, reflecting mutual inhibition between episodic memory and 260 statistical learning. When predictive information is available in an environment, it may be redundant to encode new experiences. Moreover, encoding such experiences would risk over-fitting or improperly updating known, predictive regularities with idiosyncratic or noisy details. By focusing on upcoming events, the hippocampus can better serve as a comparator between expectations   Error bars represent 95% confidence intervals. *p <0.05.

Experiment 1
Participants. Procedure. Participants first completed a learning phase. On each trial, they 295 viewed a photograph of a scene for 1000 ms, during which they had to respond based on whether it contained a manmade object (See Main Text Figure 1A).
Participants were instructed to respond as quickly and accurately as possible (response mappings of 'j'/'k' onto 'yes'/'no' were counterbalanced across participants), and we recorded response time (RT) and accuracy. The scene remained 300 on the screen for 1000 ms regardless of button press to equate encoding time, and trials were separated by a 500-ms inter-stimulus interval during which a fixation cross appeared.
Every scene was trial-unique, but was drawn from one of 12 outdoor scene categories (beaches, bridges, canyons, deserts, fields, forests, lakes, lighthouses, were spread equally across quartiles of the learning phase to minimize differences in study-test lag between categories; and the overall transition probability between "yes" and "no" responses was forced to be statistically indistinguishable from 0.5. After the learning phase, participants performed five minutes of a distracting 325 math phase to minimize recency effects. Each of 60 math problems consisted of division and subtraction, and the answer to the problem was always 1, 2, 3, or 4. Participants responded using the 1, 2, 3, and 4 keys on the keyboard, with a maximum response window of 5 s. The inter-stimulus interval was adjusted based on the RT (5 s -RT), to ensure that this phase lasted exactly 5 min given 330 the 60 trials. Participants were instructed to respond as accurately as possible. trial testing a pair from the learning phase); whether it appeared first or second was counterbalanced. The other half of the trials contained a "dummy coded" pair of the X categories (there was no correct answer on these trials). This was done to equate the frequency of categories, which was important for participants who received the category pair test before the episodic memory test.

350
Each true/dummy-coded pair was tested twice against a scrambled pair of the same categories (e.g., if beach → field, mountain → bridge, canyon → forest were category pairs from the learning phase, the foils might be beach → bridge, mountain → forest, canyon → field).
The episodic memory test was designed to assess episodic memory for the 355 trial-unique scenes from the learning phase. On each trial, one scene was presented and participants had to indicate whether it was "old" (i.e., presented during the learning phase) or "new" (i.e., not previously seen in the experiment).
After making an old/new response (using 'j'/'k' keys on the keyboard), participants then rated their confidence in this response ("not confident"/"confident", 360 using 'd'/'f' keys). Participants had 6 s to make each response. All 192 scene photographs from the learning phase were shown, in addition to 48 foils (4 novel exemplars from each category). The order of the scenes was randomized. were recruited from the Yale University community for either course credit or $10 compensation. Informed consent was obtained in a manner approved by the Yale University Human Subjects Committee.
Stimuli and Apparatus. Same as Experiment 1.
Procedure. The procedure was identical to that of Experiment 1, with the fol-370 lowing exception: we replaced the confidence judgment of the episodic memory test with a temporal source judgment. In other words, participants were presented with a scene and again asked to judge whether it was "old" or "new" (using the 'd' and 'f' keys). However, instead of then being asked to judge how confident they were in their response, old responses were followed by the pre- Before and after the three learning runs, respectively, there were "pre" and "post" templating phases (one fMRI run each). To participants, these phases were identical to the learning phase (e.g., stimulus timing and task were identical). However, there were no category-level regularities in these two runs.

410
Scenes from all categories were presented in a random order. To limit the impact of this random presentation on subsequent learning, participants completed a distracting math task between the "pre" templating run and the first learning run. Each of these five functional runs (three learning runs and pre/post runs) lasted 6.4 minutes.

415
For the episodic memory test, as in Experiment 1, a scene was presented and participants indicated (with their index and pinky fingers) whether it was "old" (i.e., presented during the learning phase) or "new" (i.e., not previously seen in the experiment). They then rated their confidence in this response ("very unsure","unsure","sure","very sure"), using their index through pinky fingers, 420 respectively. Participants had 6 s to respond to each of these questions. They completed this task while in the scanner, but no fMRI data were collected. No category pair test was administered in this experiment. fMRI Preprocessing. fMRI data processing was carried out using FEAT (fMRI 435 Expert Analysis Tool) Version 6.00, part of FSL (FMRIB's Software Library, www.fmrib.ox.ac.uk/fsl) version 5.0.10. EPI and anatomical images were skullstripped using the Brain Extraction Tool (Smith, 2002). Susceptibility-induced distortions measured via the opposing-phase spin echo volumes were corrected using FSL's topup tool (Andersson et al., 2003). Each functional run was 440 high-pass filtered with a 128 s period cut-off, corrected for head motion using MCFLIRT (Jenkinson et al., 2002), and motion outliers were computed. Slicetiming correction was performed. No spatial smoothing was applied. Lastly, the six motion parameters, as well as motion outliers, were regressed against the BOLD timecourse using a general linear model (GLM). The residuals from 445 this preprocessing model (which contain BOLD responses to the task after controlling for motion) were then used for subsequent analyses.
Functional images were registered to each participant's T1 anatomical scan using boundary-based registration, as well as to a 2 mm MNI standard brain, using 12 degrees of freedom. Lastly, the two T2 anatomical images collected 450 were registered to one another and averaged; the resulting averaged image was registered to the T1 anatomical image using FLIRT (Jenkinson and Smith, 2001 (Insausti et al., 1998;Pruessner et al., 2002;Duvernoy, 2005;Aly and Turk-Browne, 2015 and mountain trials -such that the classifier estimated evidence for field and bridge during each beach or mountain trial -and accuracy was computed (such that, for example, accuracy on a beach trial was 1 if the classifier outputted more evidence for field than for mountain). This was repeated for the bridge vs. forest classifier (testing for evidence of these categories during mountains and canyons) 515 and the field vs. forest classifier (testing for evidence of these categories during beaches and canyons). The accuracies of these three classifiers were averaged into a single accuracy for each participant. This was repeated for the three other comparisons above and for each ROI. To assess reliability at the group level, performance was compared to a chance level of 0.50 across participants 520 using a one-sample t-test.
PPI Analysis. To examine how functional connectivity between hippocampal subfields is affected by statistical learning, we conducted a psychophysiological interaction (PPI) analysis. Because we aimed to assess the impact of learned regularities, we limited this analysis to the second and third runs of the learning 525 phase. Including the first learning run may have weakened the key contrast of connectivity during periods with and without regularities, as participants had no knowledge of the regularities at the beginning of that run.
To perform the PPI, we concatenated the aligned, normalized residual timecourses from the pre-processing GLM across the two included runs. We then av-530 eraged the activity of voxels in each ROI to compute a mean timecourse for CA1, CA2/3/DG, and EC. Additionally, we extracted the onsets of Structured (A & B) and Random (X) pictures, and convolved each of these two condition regressors with a double-gamma hemodynamic response function (fmrisim function in BrainIAK, http://brainiak.org). We then ran a GLM in R (https://cran.r-535 project.org/), in which we predicted the timecourse of CA1 as a linear combination of regressors for the timecourses of CA2/3/DG and EC, task events in the two learning conditions (Structured and Random), and the interaction between ROIs and learning conditions. The interaction regressors were defined as the products of the ROI and condition timecourses (EC*Structured, EC*Random, 540 CA2/3/DG*Structured, CA2/3/DG*Random). Each regressor, as well as the CA1 timecourse, was z-scored and entered simultaneously into the model. A separate model was run for each participant, resulting in one coefficient per regressor per participant. For an interaction regressor of interest, the coefficients were correlated with A memory performance for the corresponding condition 545 across participants.
Searchlight Analysis. To explore the specificity of our Prediction of B results in the brain, we performed the category decoding analysis within a 27-voxel cube that was moved across all functional voxels (searchlight function in BrainIAK).
Each aligned, normalized residual volume from the pre-processing GLM was reg-550 istered to standard space. These volumes were masked for each searchlight and the retained voxels were subjected to the same decoding pipeline described above for the ROIs. The result was a searchlight map per participant, in which the value at each voxel reflected the average classification accuracy for the cube centered at that voxel. The reliability of these maps was assessed at the group level 555 using nonparametric randomization tests (randomise function in FSL) (Winkler et al., 2014), corrected for multiple comparisons using threshold-free cluster enhancement (Smith and Nichols, 2009). As a control analysis, we ran the same searchlight procedure for the Perception of B comparison.
Data and Code Availability. fMRI data can be downloaded from OpenNeuro 560 and behavioral data can be downloaded from Dryad. Analysis code can be accessed on Github.