Reward-specific satiety affects subjective value signals in orbitofrontal cortex during multicomponent economic choice

Significance Ongoing consumption reduces the subjective value of rewards to different degrees depending on their individual properties, a phenomenon referred to as sensory-specific satiety. Such value change should be manifested in economic choices, and neuronal signals for subjective economic reward value should be sensitive to reward-specific satiety. We tested monkeys during the choice between two options that each contained two different rewards (“bundles”); the two rewards were prone to different degrees of satiety. Ongoing reward consumption affected choices in a way that indicated satiety-induced reward-specific change of subjective economic value. Neuronal responses in the monkey orbitofrontal cortex (OFC) followed the differential reduction of subjective economic value. These results satisfy a crucial requirement for subjective reward value coding in OFC neurons.

added inosine monophosphate (IMG). While MSG and IMG are known taste enhancers, we did not test choice preferences between them on their own. Thus, we cannot state whether they had own reward value for the animals.
Task. Each trial began when the animal contacted a centrally located touch sensitive key for 1.0 s after a pseudorandom inter-trial interval of 1.6 ± 0.25 s. Then the two stimulus pairs representing the two bundles appeared at pseudorandomly alternating fixed left-right positions on a computer monitor in front of the animal (Fig. 1B). After 2.0 s, two blue spots appeared as GO stimulus underneath the bundle stimuli, upon which the animal released the touch key and touched the blue spot underneath the bundle of its choice within 2.0 s. The required action consisted of one arm movement and was constant across bundles and trials. After a hold time of 1.0 s, the blue spot underneath the chosen bundle turned green, and the blue spot underneath the unchosen bundle disappeared. Simultaneously a white frame around the chosen bundle appeared as feedback for successful choice. The computer-controlled liquid solenoid valve delivered Reward A at 1.0 s after the choice, followed 0.5 s later by Reward B (except when using peach juice as Reward B; here the sequence was reversed: Reward B was delivered first, then 0.5 s later Reward A, blackcurrant juice). Task training was initially restricted to one bundle type and was extended to other bundle types only when satisfactory behavioral performance was obtained.
The longer delay for liquid B compared to liquid A likely generated asymmetric temporal discounting that affected the subjective value of each liquid. However, all delays were kept constant, which allowed the subjective value differences from different temporal discounting to be incorporated as a constant factor into the subjective value of each liquid. We choose this delay, rather than simultaneous delivery or pseudorandomly alternating single liquid delivery, to prevent more serious taste interactions between simultaneously delivered liquids and to avoid temporal uncertainty. Reaching for a target before appearance of the blue dots, or key release during required key touch or target-hold, were considered as errors and lead directly to the inter-trial interval without reward.
Estimation of behavioral ICs. The behavioral method for obtaining an IP from stochastic choice has been presented in full detail (1,2). With two bundle options, the animal chose between the pre-set Reference Bundle (left in Fig. 1A) and the Variable Bundle (right) in repeated trials. Thus, the constant Reference Bundle provided a stable reference against the changing bundle composition in the Variable Bundle. We set one reward in the Variable Bundle to one unit (> 0.1 ml) above the quantity of the same reward in the Reference Bundle, while pseudorandomly varying the quantity of the other reward of the Variable Bundle over the whole test range and in pseudorandom temporal order within each block of trials. The variation of the animal's repeated choice with that single, pseudorandomly varying reward allowed us to construct a full psychophysical function and estimate an IP from Weibull fitting (point of subjective equivalence; P = 0.5 choice of each bundle).
As in our previous study (1), we used the Matlab function GLMFIT for psychophysical fitting. This function returns a number called 'Deviance' between 0 and infinity that can be used to compare fitting between Weibull and logit. The Deviance is the difference between the loglikelihood of the fitted model and the maximum possible log-likelihood. Lower values are better. The estimated Deviance for psychophysics for the first 5,000 trials and 2 monkeys was 1.0415 for the Weibull model and 1.6009 for the logit model, suggesting that the Weibull fitted the data better. Hence, we used Weibull fitting for all psychophysical fitting.
We obtained each IP from a total of 80 trials (2 left-right stimulus positions with 5 equally spaced reward quantities in 8 trials). To avoid known adaptations in OFC neurons (3)(4)(5)(6), we always tested the full reward range of the experiment.
To obtain an IC, we fit a series of IPs with a hyperbolic function d using weighted least mean squares: d = ay + bx + cxy [1] with y and x as milliliter quantity of Reward A (plotted on y-axis on 2D graph, Fig. 1D, E) and Reward B (plotted on x-axis), a and b as weights of the influence of the Reward quantities plotted on the y-and x-axes, respectively, and c as curvature. A potent reward that contributes strongly to the choice of the bundle would have a large weight (high coefficient a or b), whereas a less potent reward would have lower weight coefficients. Thus, with the potent (more weight) reward plotted on the x-axis, and the less potent (less weight) reward plotted on the y-axis, choice indifference between them would occur with smaller milliliter quantities on the x-axis compared to the y-axis. Hence, the IC slope would be steeper than the diagonal line (see Fig. 1D, E). By resolving Eq. 1 as y = -(b / a) * x, the IC slope would be the ratio of the coefficients that reflect the weights of the rewards: -b / a. With a higher potency of Reward B (x-axis) compared to Reward A (y-axis), the rectified IC slope would be larger than 1. Relatively stronger satiety for Reward B (x-axis) compared to Reward A (y-axis) would reduce the weight of Reward B, reduce the absolute value of the ratio -b / a, and flatten the IC slope. Thus, the IC slope -b / a describes the relative impact of the two bundle rewards (reflecting the value ratio between the two rewards), whereas the weights (a and b) describe the influence of the reward quantities. The hyperbolic function can be re-written in an equivalent form to the regression with interaction used for analysing neuronal responses (see Eq. 3 below): with A and B as milliliter quantity of Reward A (plotted at y-axis) and Reward B (x-axis), respectively, b 0 as offset coefficient, b 1 and b 2 as behavioral regression coefficients, and e as compound of errors err 0 , err 1 , err 2 , err 3 for offset and regressors 1-3.
Definition and criteria for pre-sated and sated states. With on-going reward consumption, the changes of psychophysical choice functions exceeding the confidence intervals (CI) of initial tests suggested a changed subjective value relationship between the two bundle rewards suggestive of relative, reward-specific satiety (see Figs. 1D, S1A, S1E). More specifically, the gradual effect of satiety on choice preference was identified by tracking the IPs as consumption advanced across blocks of 80 trials. Importantly, these changes occurred fast enough to be studied during the recording durations of single neurons, thus allowing us to compare responses between non-sated and sated states in the same neuron. The Weibull-fitted IPs were obtained psychophysically for fixed and equally spaced quantities of Reward B. Changes in relative subjective value of the two bundle rewards were assessed with interleaved anchor trials in choices between bundles with only one non-zero reward: bundle (fixed non-zero blackcurrant juice; no Reward B) vs. bundle (no blackcurrant juice; variable non-zero Reward B), using any Reward B (Fig. S1B). To aggregate IP data across sessions and compensate for across-session variability, we normalized the reward quantity ratio to the first titration block in all sessions. We then compared the normalized distributions of IPs within the CI of the first block with the distributions of IPs exceeding the CI of the first block.
Control regression for behavioral choice. To test whether the animal's choice reflected the quantity of the bundle rewards during satiety, rather than other, unintended variables such as spatial bias, we used the logistic regression with P (V) as probability of choice of Variable Bundle, b 0 as offset coefficient, b 1 -b 7 as correlation strength (regression slope) coefficients indicating the influence of the respective regressor, CT as trial number within blocks of consecutive trials, RA as quantity of Reward A of Reference Bundle, RB as quantity of Reward B of Reference Bundle, VA as quantity of Reward A of Variable Bundle, VB as quantity of Reward B of Variable Bundle, CL as choice of any bundle stimulus presented at the left, MA as consumed quantity of Reward A, MB as consumed quantity of Reward B, and e as compound error for offset and all regressors. We used a binomial fit with logit link function to obtain standardized b coefficients. Choices over zero-reward bundles were excluded in the regression to avoid internal correlation between value and consumption.
Licking. Licking was monitored with an infrared optosensor positioned below the juice spout (V6AP; STM Sensors). Anticipatory lick durations were measured between the appearance of the bundle stimuli and delivery of the first reward liquid (approximately 5 -6 s duration) in bundles containing only one non-zero reward (single-reward bundles) within single working sessions. Licking data were collected with four bundle types, namely (blackcurrant juice, grape juice), (blackcurrant juice, water), (blackcurrant juice, strawberry juice) and (blackcurrant juice, mango juice).
Surgical procedures and electrophysiology. As described before for the same animals (2), a head-restraining device and a recording chamber (40 x 40 mm, Gray Matter) were implanted on the skull under full general anesthesia and aseptic conditions. The stereotactic coordinates of the chamber enabled neuronal recordings of the orbitofrontal cortex (OFC) (7). We located the OFC from bone marks on coronal and sagittal radiographs taken with a guide cannula inserted at a known coordinate in reference to the implanted chamber, using a medio-lateral vertical and a 20º degree forward-directed approach aiming for area 13. Monkey A provided data from the left hemisphere and Monkey B from the right hemisphere via a craniotomy ranging from Anterior 30 to Anterior 38 and Lateral from 0 to 19. We conducted single-neuron electrophysiological recordings using both custom made glass-coated tungsten electrodes (8) and commercial electrodes (Alpha Omega) (impedance of about 1 MOhm at 1 kHz). Electrodes were inserted into the cortex with a multielectrode drive (NaN drive) with the same angled approach as used for the radiography. Neuronal signals were collected at 20 kHz, amplified using conventional differential amplifiers (CED 1902 Cambridge Electronics Design) and band-passed filtered (high: 300 Hz, low: 5 kHz). We used a Schmitt-trigger to digitize the analog neuronal signal online into a computer-compatible TTL signal. However, we did not use the Schmitt-trigger to separate simultaneous recordings from multiple neurons, in which case we searched for another recording from only a single neuron, or we stored occasionally the data in analog form for off-line separation by dedicated software (Plexon offline sorter). An infrared eye tracking system monitored eye position (ETL200; ISCAN), with temperature check on an experimenter's hand at the approximate position of the animal's head.
Definition of neurons following the revealed preference scheme. We analysed singleneuron activity during four task epochs vs. Pretrial control (1 s): visual Bundle stimulus (2 s), Go signal (1 s), Choice (1 s) and Reward (2 s, starting with Reward A, followed 0.5 s later by Reward B, except where noted, thus covering both rewards). To establish neuronal relationships to these task epochs, we compared the activity in each neuron during each task epoch separately against the Pretrial control epoch using the paired Wilcoxon test (P < 0.01). A neuron was considered taskrelated if its activity in at least one of the four task epochs differed significantly from the activity during the Pretrial control epoch.
Responses of individual neurons should follow the scheme of two-dimensional ICs that characterizes revealed behavioral preferences for two-dimensional bundles. Specifically, the responses should comply with three characteristics defined previously (2).
(Characteristic 1) Neuronal responses should change monotonically with increasing behavioral preference across behavioral ICs, irrespective to bundle composition. Such monotonic neuronal response changes should reflect increasing quantities of one or both bundle rewards, assuming a positive monotonic subjective value function on reward quantity.
(Characteristic 2) Neuronal responses should vary insignificantly for all equally preferred bundles positioned along a same behavioral IC, despite different physical bundle composition.
(Characteristic 3) Neuronal responses should follow the IC slope and the nonlinear curvature of behavioral ICs. The IC slope reflects the subjective value relationship between the two bundle rewards, and thus the subjective value of one reward (in our case Reward B) in the common currency of a reference reward (Reward A).
We used a combination of three statistical tests to assess these characteristics.
Characteristic 1: To capture the change across ICs in the most conservative, assumption-free manner possible, we used a simple linear regression on each Wilcoxon-identified task-related response: with y as neuronal response in any of the four task epochs, measured as impulses/s and z-scored normalized to the Pretrial control epoch of 1.0 s (z-scoring of neuronal responses applied to all regressions listed below), A and B as milliliter quantity of Reward A (plotted at y-axis) and Reward B (x-axis), respectively, b 0 as offset coefficient, b 1 and b 2 as neuronal regression coefficients, and e as compound error for offset and all regressors.
The coefficients b 1 and b 2 needed to be either both positive (indicating positive neuronal relationship, higher neuronal activity reflecting more reward quantity) or both negative (inverse neuronal relationship) to reflect the additive nature of the individual bundle components giving rise to revealed preference (P < 0.05, unless otherwise stated; t-test). This linear regression assessed the degree of linear monotonicity of neuronal response change across ICs (P < 0.05 for b coefficients; t-test). Further, all significant positive or negative response changes identified by Eq. 3 needed to be also significant in a Spearman rank-correlation test that assessed ordinal monotonicity of response change across ICs without assuming linearity and numeric scale (P < 0.05). Characteristics 1 and 2: To assess the two-dimensional across/along IC scheme in a direct and intuitive way, and without assuming monotonicity, linearity and numeric scale, we used a twofactor Anova on each Wilcoxon-identified task-related response that was also significant for both regressors in Eq. 3; the factors were across-IC (ascending rank order of behavioral ICs) and along-IC (same rank order of behavioral IC). To be a candidate for following the IC scheme of Revealed Preference Theory, changes across-ICs should be significant (P < 0.05), changes within-IC should be insignificant, and their interaction should be insignificant.
Characteristic 3: Whereas the regression defined by Eq. 3 estimated neuronal responses across ICs, a full estimation of neuronal ICs for comparison with behavioral ICs would require inclusion of the IC slope and curvature, both of which depended on both rewards. By simplifying Eq. 3 by setting to zero both the b 3 coefficient and the constant neuronal response along the IC, the neuronal IC slope would be the ratio of coefficients (-b 2 / b 1 ). Note the different meanings of the slope term: the neuronal IC slope (-b 2 / b 1 ) describes the relative coding strength of the two bundle rewards (reflecting the neuronal ratio of the two rewards), whereas each neuronal regression slope alone (b) describes the coding strength of neuronal response (correlation with the specific regressor). The neuronal IC curvature was estimated from the b 3 coefficient of the interaction term AB (all b's P < 0.05; t-test).
Neuronal chosen value coding. As stated before (2), chosen value (CV) was defined as the value of the option the animal was going to choose or had already chosen. As each option consisted of two components, we used a linear combination of the quantity of Reward A (blackcurrant juice) and Reward B (any of the other five rewards): Weighting parameter k 1 served to adjust for differences in subjective value between rewards A and B. We established parameter k 1 during neuronal recording sessions from behavioral choice IPs using quantitative psychophysics in anchor trials (80 trials per test, see above Trial types for neuronal tests), rather than reading it from fitted ICs. Thus, k 1 equals the ratio of coefficients b 2 / b 1 of Eq. 3.
We established a common-currency scale in ml for all tested rewards by defining blackcurrant juice or blackcurrant-MSG (Reward A) as reference (numeraire); the subjective value of any reward is expressed as real-number multiple k 1 of the quantity of the numeraire at choice indifference.
Specifically, the animal chose between the Variable Bundle that contained a psychophysically varied quantity of blackcurrant juice and the Reference Bundle that contained a fixed quantity of blackcurrant juice. A k 1 of < 1 indicated that more quantity was required for choice indifference against blackcurrant juice; thus, k 1 < 1 suggested that the tested reward had lower subjective value than blackcurrant juice. By contrast, k 1 > 1 suggested higher subjective value, as less quantity was required for choice indifference.
We assessed the coding of chosen value and unchosen value in all neurons that followed the revealed preference scheme, using the following regression: with UCV as value of the unchosen option that was not further considered here, and e as compound error for offset and all regressors.
Vector plots of OFC reward sensitivity. The purpose of this analysis was to provide quantitative and graphic information about satiety-induced behavioral and neuronal changes that would allow comparison with previous OFC studies that had not used two-component choice options with individually varying reward quantities and therefore did not establish ICs (9). This simplified analysis addressed monotonic response increase or decrease with increasing quantities of bundle rewards across ICs (characteristic 1 above), but did not address other IC characteristics such as trade-off, slope and curvature (characteristics 2 and 3) that had not been investigated previously. We established 2D plots whose dots indicated the relative contribution of each of the two bundle rewards to the neuronal response. We then compared vectors of behavioral choices with vectors of averaged neuronal population responses before and during satiety.
For behavioral choices, we plotted vectors (with 95% confidence intervals) from averaged dot positions defined by reward quantity (distance from center: sqrt (b 1 2 + b 2 2 )) and relative weight (elevation angle: arctangent (b 1 / b 2 )); coefficient b 1 refers to Reward A (blackcurrant, y-axis), coefficient b 2 refers to any of the other rewards (x-axis) (Eq. 1a). The angle of the vector reflects the relative contribution the two bundle rewards to the choice, as estimated by the a and b coefficients (Eq. 1). A deviation of the alignment angle from the diagonal line indicates an unequal contribution weight to bundle choice, and thus a non-1:1 reward ratio.
For neuronal responses, each dot on the two-dimensional plot was defined by the two b regression coefficients for neuronal responses (Eq. 3; P < 0.01, t-test) for each of the two rewards in any of the four task epochs. The distance from center indicates the z-scored response magnitude (sqrt (b 1 2 + b 2 2 )), coding sign (positive or negative), and relative weight (elevation angle; arctangent (b 1 / b 2 )) of the two b coefficients. Coefficient b 1 refers to Reward A (blackcurrant, yaxis), coefficient b 2 refers to any of the other rewards (x-axis). Responses with negative (inverse) coding were rectified. Further IC characteristics such as systematic trade-off across multiple IPs and IC curvature played no role in these graphs. The alignment of the dots along the diagonal axis shows the relative coding strength for the two bundle rewards, as estimated by the b regression coefficients; a deviation from the diagonal line indicates an unequal influence of the two bundle rewards on the neuronal responses, reflecting a neuronal correlate of reward ratio.
Neuronal decoding. We used a linear support vector machine (SVM) classifier to decode neuronal activity according to bundles presented at different behavioral ICs during choice over zero-reward bundle (bundle distinction) and, separately, according to the behavioral choice between two non-zero bundles located on different ICs (choice prediction). As in our main study on revealed preferences (2), we implemented the decoder with linear kernel using custom-written software with svmtrain and svmclassify procedures in Matlab R2015b (Mathworks). (our previous work had shown that use of nonlinear SVM kernels did not improve decoding) (10). The SVM decoder was trained to find the optimal linear hyperplane for the best separation between two neuronal populations relative to lower vs. higher ICs.
All analyses employed single-neuron data, consisting of single-trial impulse counts that had been z-normalised to the activity during the Pretrial epoch in all trials recorded with the neuron under study. The analysis included activity from all neurons whose responses followed the IC scheme of revealed preferences during any of the four task epochs, as identified by our three-test statistics, except where noted. The neurons were recorded one at a time; therefore, the analysis concerned aggregated pseudo-populations of neuronal responses.
The decoding analysis used 10 trials per neuron for each of two ICs (total of 20 trials). Extensive analysis suggested that higher inclusion of 15-20 trials per group did not provide significantly better decoding rates (while reducing the number of included neurons). For neurons that had been recorded with > 10 trials per IC, we selected randomly 10 trials from each neuron for each of the two ICs. We used a leave-one-out cross-validation method in which we removed one of the 20 trials and trained the SVM decoder on the remaining 19 trials. We then used the SVM decoder to assess whether it accurately detected the IC of the left-out trial. We repeated this procedure 20 times, every time leaving out another one of the 20 trials. These 20 repetitions resulted in a percentage of accurate decoding (% out of n = 20). The final percentage estimate of accurate decoding resulted from averaging the results from 150 iterations of this 20-trial random selection procedure. To distinguish from chance decoding, we randomly shuffled the assignment of neuronal responses to the tested ICs, which should result in chance decoding (accuracy of 50% correct). A significant decoding with the real, non-shuffled data would be expressed as statistically significant difference against the shuffled data (P < 0.01; Wilcoxon rank-sum test).

Fig. S1. Additional behavioral tests demonstrating reward-specific satiety by changes of indifference curves (IC).
(A) Psychophysical assessment of choice between single-reward bundles with grape juice variation (constant Reference Bundle: 0.4 ml blackcurrant juice, 0.0 ml grape juice; Variable bundle: 0.0 ml blackcurrant juice, varying grape juice). Green and violet curves inside green 95% confidence interval: initial choices; blue, orange and red curves: advancing consumption. The decrease in blackcurrant : grape juice ratio at IP was significant between the first IP and all IPs exceeding the first confidence interval (ratios of 1.9857 ± 0.0173, n = 139, green, vs. 1.0077 ± 0.02, orange and red; mean ± standard error of the mean, SEM; individual trial blocks: P = 9.6943 x 10 7 , Kolmogorov-Smirnov test; P = 2.336 x 10 -32 , Wilcoxon rank-sum test; P = 3.1712 x 10 -46 , t-test;        Table S1. Neuronal changes with on-going reward consumption during different task epochs. decrease with higher subjective value. Most neurons were tested both in choice over zero-reward bundle and in choice between two non-zero bundles. Changes during the Reward epoch may indiscriminately reflect changes in subjective reward value and consumption (mouth movements, sensory stimulation); no attempts were made to distinguish neuronal relationships between these factors.