Visual fixation patterns during economic choice reflect covert valuation processes that emerge with learning

Significance Where we direct our gaze can have a big impact on what we choose. However, where we choose to gaze during the decision process is not well-characterized, despite the important role it plays. In our study, monkeys performed a simple decision-making experiment where they were free to look around a computer screen showing choice options. They then indicated their economic choice with a joystick movement. When choice options appeared, monkeys rapidly gazed toward more valuable and novel stimuli—suggesting there is a system that orients gaze toward important information. However, despite the gaze preference for novel stimuli, subjects did not prefer to choose them. This suggests the mechanisms governing value-guided attentional capture and value-guided choice are dissociable.


Behavioural Task
Details of the experimental setup have been described in detail previously (1). The behavioural paradigm was run using the MATLAB based toolbox MonkeyLogic (http://www.monkeylogic.net/, Brown University, USA) (2)(3)(4). A photodiode test was performed to benchmark the system, confirming eventmarkers precisely indicated the time of task events (within 2ms). We monitored eye position and pupil dilation during the task using an infra-red system (ISCAN ETL-200) sampling at 240Hz. Monkeys used a joystick to report their economic choices. All joystick and eye position data was relayed to MonkeyLogic for use online during the task. It was also interpolated, and recorded by MonkeyLogic at 1000Hz for offline analysis.
Subjects performed a value-based decision-making task (Choice Phase; Fig. 1A) following a short conditioning phase. In the conditioning phase (SI Appendix: Fig. S1), subjects learned the values of 10 Novel reward-predictive pictorial stimuli. Half of the stimuli indicated the probability of receiving reward (10%, 30%, 50%, 70%, 90%) and the other half were associated with one of five magnitudes of reward size (0.14g, 0.33g, 0.51g, 0.71g, 0.90g). The secondary conditioning procedure consisted of one-alternative 'forced choice' trials for subjects to learn the stimulus values. In a single block of trials, subjects completed a 'forced-choice' trial for each value of a particular attribute (probability or magnitude), then attempted two choice trials (SI Appendix: Fig.  S1A) between two of the stimuli. Blocks alternated between attributes. Ten blocks were completed for each attribute.
Following completion of the conditioning phase, the choice phase began. In the choice phase, only two-alternative choice trials were presented (i.e., no 'forcedchoice' trials presented). In addition to the Novel stimuli subjects had just learned, 10 other reward-predictive stimuli were also presented as choice options. Subjects had been heavily exposed to these additional stimuli in previous training sessions prior to the first data collection session (M: ~1500, F: ~3000 total exposures to the stimulus set), hence they were referred to as Overtrained. The same 10 Overtrained stimuli were used in each behavioural session. Novel stimuli were not reused as Overtrained stimuli in subsequent behavioural sessions.
Subjects could be given trials consisting of both Overtrained stimuli (Overtrained trials; Fig. 1C), two Novel stimuli (Novel trial) or one of each (Mixed trials). Subjects were always asked to make choices within a certain attribute (e.g. choosing between probabilities) and never between attributes. Subjects never had to choose between two stimuli of equal valuetherefore optimality was defined as choosing the most valuable option. All trial types were pseudorandomly interleaved.
A representation of a choice trial timeline can be found in Fig. 1B. Subjects initiated each trial by returning the joystick to its centre position. At this point, a white background appeared on the screen with a red central fixation square (0.5 x 0.5 visual degrees in size). Subjects were required to fixate the red square for a continuous 500ms (fixation radius of 3 visual degrees) within a 10s time period. If this was not achieved, a short 'timeout' was given and the trial restarted. Once the fixation period was completed, the fixation spot disappeared and two isoluminant picture stimuli (100 x 100 pixels) were presented 6.5 visual degrees to the left and right of the centre. Importantly, subjects were free to saccade to anywhere on (or off) the screen and to choose a stimulus using a left/right joystick response at any time after the stimuli were presented. If subjects did not respond within 5s the trial was aborted. Once the response was made, a grey square was drawn around the chosen stimulus and a 500ms pre-feedback period was initiated. After the pre-feedback period, the unchosen stimulus was removed from the screen and the reward epoch was initiated. Subjects were rewarded with juice (according to the reward probabilities and volumes described above) delivered to the mouth using a precise peristaltic pump (ISMATEC IPC).
A representation of a conditioning trial timeline can be found in SI Appendix: Fig. S1C. In these trials, following successful fixation, a single stimulus appeared on either the left or right of the screen. Once the stimulus was chosen by the joystick response, it again remained highlighted for 500ms. However, following this initial highlighted period, the stimulus then disappeared from the screen. After a further 500ms delay, a secondary reinforcer appeared for 500ms. This was a coloured bar on top of a white background. The height of the bar indicated the chosen stimulus value. After the secondary reinforcer disappeared, there was a prefeedback period before the reward epoch. Data from this conditioning phase is not described in the current report.
After completing a behavioural session of this task on 'Day 1', subjects then performed a different decision-making task on subsequent sessions ('Days 2-4') using the Novel stimuli learned on 'Day 1'. This testing schedule then restarted with a new 'Day 1' session. Only data from these 'Day 1' sessions is described in the present report.

Behavioural Analysis
For each trial, eye position data was analysed from the time of stimuli onset until the joystick was moved outside of a central two-degree visual radius. This time period was defined as the subject's reaction time for the trial (Fig. 1E, SI  Appendix: Fig. S2).

Equation 1
In order to determine what information the subject fixated on the screen, a region of interest (ROI) was defined for each stimulus. At any given time, subjects were considered to be viewing a stimulus if the recorded x-coordinate of the eye position data was within 2.5 degrees of the centre of the stimulus. The number of stimuli viewed per trial ( Fig. 2A) was calculated by determining if one or both stimuli were viewed for at least 15ms. The number of fixations in each trial (SI Appendix: Fig. S3B, G) summed the number of distinct 15ms periods that stimuli were fixated. To be considered a separate fixation, subjects had to switch their gaze between the two stimuli. For example, if their eye position were inside the left stimulus ROI for 200ms, then in neither ROI for 5ms, then returned to the left stimulus ROI for a further 100ms, this would only be considered a single fixation (as opposed to two separate fixations).
The latency of the first fixation (Fig. 2B), and which stimulus was fixated first, were defined using a saccade detection algorithm (5,6). Eye position data was zero-phase filtered using a second order butterworth filter with a cut off frequency of 35Hz. A threshold of 7 degrees/second, horizontal distance of greater than 4 degrees, and minimum duration of 20ms were used. For each trial, the first detected saccade defined the first fixation latency and direction. The first stimulus dwell time (Fig. 4, SI Appendix: Fig. S8) was determined to be the viewing time allocated to the stimulus initially fixated. The dwell time advantage for the first fixated stimulus (Fig. 6, SI Appendix: Fig. S13) was the first stimulus dwell time minus the total viewing time allocated to the other stimulus. On trials where only a single stimulus was fixated, the total viewing time allocated to the other stimulus was 0ms.
On a small proportion of completed trials (0.47%) no saccades were detected using the saccade detection algorithm. If the ROI analysis described above indicated the subject had fixated a stimulus for >15ms on these trials, this suggested either a saccade had occurred below the algorithm's thresholds, or the subject had not moved his gaze towards one of the stimuli with a single ballistic eye movement. On these trials, the first fixation direction was defined based upon the first ROI acquired. Fixation latency was left undefined.
The majority of data analysis utilised logistic regression and was performed using data collapsed across all sessions for a given subject, unless otherwise stated (correlation across session analyses in Fig. 3A, SI Appendix: Fig. S5, Fig. S6,  Fig. S10). Logistic regressions were performed using Equation 1 where YP is the probability of observing an event, b0 is a constant term, bn is a weighting coefficient and xn is a regressor: All linear regressions were performed using Equation 2 where Y is the dependant variable, b0 is a constant term, bn is a weighting coefficient and xn is a regressor: Table S3 contains detailed descriptions of the regression models used to analyse behaviour. Comparisons between relevant regressors were performed using a linear hypothesis test. As the first stimulus dwell time (Fig.  4D), reaction time (SI Appendix: Fig. S2C, G), and latency of first fixation (SI Appendix: Fig. S3E, J) variables were not normally distributed, they were logtransformed before performing linear regression analysis. The dwell time data was further z-scored to make for a clearer visualisation (Fig. 4C), but the logtransformed variable was used in the relevant analyses.
In order to test how fixation behaviour changed over the course of a behavioural session, we used a logistic regression approach ( This data for each trial decile was then pooled across sessions. When performing the regression analyses, there was a separate constant term and value difference term for each decile (SI Appendix: Table S3). Therefore, we could assess if fixations became more value driven by comparing the 10 valuedifference regression coefficients (Fig. 3B, SI Appendix: Fig. S5, Fig. S6). We could also evaluate whether the novelty bias changed over the course of a session by reviewing the 10 constant term regression coefficients (SI Appendix: Fig. S10). This analysis is therefore complementary to testing whether the proportion of fixations to the more valuable ( Fig. 3A) or Novel stimulus (SI Appendix: Fig. S10A) change over the session. The regression analysis additionally isolates any value-based effects from bias effects. When assessing if fixations became more driven by value on Novel trials ( Fig. 3B), this approach would be useful to control for subjects potentially showing a strong direction bias which diminished across the course of a session. It is particularly important for testing any changes in the novelty bias (SI Appendix: Fig. S10), as we needed to additionally control for fixations becoming more value driven on Mixed trials as the Novel stimulus value became well-learned.
To test the effects of fixation pattern on economic choice, over and above the effects of value, we used a regression approach (Fig. 6A-B; SI Appendix: Table  S2). We initially fitted data from all trials where subjects fixated a single stimulus, using a model with 9 regressors (SI Appendix: Table S2, Model 1; SI Appendix: Table S3, Full model). In this model there were three predictors for each trial type: a constant term, left minus right value difference, and the direction fixated. By including value difference as a co-regressor, we could study any additional effects of fixation pattern (Fig. 6B).
To confirm the direction fixated had an impact on choice, a cross-validation procedure was subsequently used to compare between regression models when the relevant three predictors were removed (SI Appendix: Table S2). We achieved this by first estimating model parameters by performing a logistic regression to predict left choice on a random half of the trials. The remaining half of the trials were used to compute model evidence (SI Appendix: Table S2). This process was repeated with 10000 splits of the trials, and the average loglikelihood of each model is reported. The Bayesian Information Criterion, which calculates model evidence with a penalty for additional parameters, is also reported for when the model is fitted to all the available data.

Data availability
Behavioural data and custom code for recreating the analyses will be available from the corresponding authors on request.

Fig. S1 Secondary conditioning for learning Novel stimulus values
Subjects began each session with the 'Conditioning Phase' where they were given 10 one-alternative forced-choice trials of 10 Novel stimuli (100 trials in total) in order to learn their values. Five of the stimuli indicated the probability of receiving reward (10%, 30%, 50%, 70%, 90%) and the other five were associated with one of five magnitudes of reward (0.14g, 0.33g, 0.51g, 0.71g, 0.90g). Secondary conditioning (with a pre-learned bar stimulus) was used to aid learning in these trials. 40 two-alternative choice trials were periodically interleaved between the one-alternative forced-choice trials. (A) Conditioning Phase Structure. Subjects completed 10 blocks of 7 trials for each attribute (magnitude and probability). Each block consisted of a one-alternative forced choice trial for each of the 5 stimuli of an attribute, then two choice trials from this attribute. The block alternated between magnitude and probability trials, with the first being randomly determined (in this schematic it is magnitude). (B) Example of a Novel stimulus set learned during an experimental session. Each stimulus is associated with a reward magnitude (top row) or a reward probability (bottom row). (C) Task Diagram for an example one-alternative forced choice magnitude trial. Subjects initiated the task by fixating on a central red fixation point for 500ms after which one pseudorandomly chosen cue was presented on either the left or the right of the screen. Subjects were free to saccade around the screen and make a manual joystick response at any time. If the joystick was moved in the direction of the stimulus, the cue was highlighted with a grey border. After 500ms, the stimulus was removed from the screen. After a further 500ms delay, a secondary reinforcer appeared. The height of this bar indicated the value of the chosen stimulus, and the colour of the bar indicated the attribute (blue bar: magnitude, black bar: probability attribute). The secondary reinforcer was removed after 500ms. After a prefeedback delay, reward was delivered.                (Fig. 6A-B). The Bayesian information criterion (BIC) and cross-validated log-likelihood (see SI Appendix: Methods) were calculated for each model. Model 1 is the best performing (i.e. higher likelihood, lower BIC) for both metrics, in both subjects. This means the direction fixated has an important impact on what subjects choose -even when controlling for stimulus values. Values are rounded to 1 decimal place.  Table S3: Details of regression models. The full regression model testing the impact of the direction fixated, over and above stimulus value (Fig. 6A-B), is shown in this table. A further model containing a subset of its predictors was compared using cross-validation. The predictors used in this model are detailed in Table S2.