Skip to main content

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home
  • Log in
  • My Cart

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
    • Front Matter Portal
    • Journal Club
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
Research Article

Computational modeling of epiphany learning

View ORCID ProfileWei James Chen and Ian Krajbich
  1. aDepartment of Economics, The Ohio State University, Columbus, OH 43210;
  2. bDepartment of Psychology, The Ohio State University, Columbus, OH 43210

See allHide authors and affiliations

PNAS first published April 17, 2017; https://doi.org/10.1073/pnas.1618161114
Wei James Chen
aDepartment of Economics, The Ohio State University, Columbus, OH 43210;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wei James Chen
Ian Krajbich
aDepartment of Economics, The Ohio State University, Columbus, OH 43210;
bDepartment of Psychology, The Ohio State University, Columbus, OH 43210
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: krajbich.1@osu.edu
  1. Edited by Alvin E. Roth, Stanford University, Palo Alto, CA, and approved February 23, 2017 (received for review November 1, 2016)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Epiphany learning (EL) occurs when organisms seem to suddenly and dramatically alter their behavior. Although EL has been documented before, this paper proposes a model of EL and reports both subjects’ behavioral and eye-tracking data while they repeatedly played a game that has an optimal strategy. We find clear behavioral evidence for EL in this game and additionally find patterns of eye movements and pupil dilation that correspond to features of the model. In the process, we also identify distinct patterns for subjects who did not discover the optimal strategy. Our model and findings provide a framework for future research on EL.

Abstract

Models of reinforcement learning (RL) are prevalent in the decision-making literature, but not all behavior seems to conform to the gradual convergence that is a central feature of RL. In some cases learning seems to happen all at once. Limited prior research on these “epiphanies” has shown evidence of sudden changes in behavior, but it remains unclear how such epiphanies occur. We propose a sequential-sampling model of epiphany learning (EL) and test it using an eye-tracking experiment. In the experiment, subjects repeatedly play a strategic game that has an optimal strategy. Subjects can learn over time from feedback but are also allowed to commit to a strategy at any time, eliminating all other options and opportunities to learn. We find that the EL model is consistent with the choices, eye movements, and pupillary responses of subjects who commit to the optimal strategy (correct epiphany) but not always of those who commit to a suboptimal strategy or who do not commit at all. Our findings suggest that EL is driven by a latent evidence accumulation process that can be revealed with eye-tracking data.

  • epiphany learning
  • eye tracking
  • pupil dilation
  • beauty contest
  • decision making

How organisms learn is a central question in the behavioral sciences. Much of the literature has focused on reinforcement learning (RL) (1⇓–3), where organisms gradually adjust their behavior in response to the outcomes of prior actions. However, there are other situations where organisms seem to suddenly and dramatically alter their behavior (4⇓⇓–7), often without any external input (8). This “epiphany” learning (EL) is characterized by an unexpected moment of insight, often portrayed in cartoons by a light bulb appearing over a person’s head, and was first introduced to economics in ref. 9.

The sudden, unexpected, and irreversible nature of EL makes it inherently difficult to predict or to study. Unlike with standard RL, where the decision maker’s choices provide insight into the underlying mechanism, here decisions provide little to no insight into the mechanism underlying the generation of epiphanies, other than to establish that they have occurred ex post (9, 10). An understanding of the EL process thus remains elusive.

Here, we sought to tackle this challenging problem with a combination of computational modeling and measures of the choice process. In particular, we use a model from the class of sequential-sampling models (SSMs) (11, 12) to capture the unconscious accumulation of evidence toward the epiphany and eye-tracking technology to study this process (13⇓⇓⇓⇓⇓⇓–20).

SSMs have received much attention in the decision sciences for their ability to capture choice outcomes, response times (21⇓–23), and, more recently, eye-tracking and neural activity in perceptual and value-based decision making (24⇓⇓⇓–28). These models assume that at the time of choice the decision maker evaluates the available options and accumulates evidence for each one until there is sufficient relative evidence for one option over the others. Once the relative evidence reaches this predetermined threshold, the decision maker implements the corresponding decision. SSMs are a natural way to capture EL because they share the feature of a sudden change in behavior due to latent processes.

Research in decision science has highlighted the usefulness of studying such latent processes using techniques such as brain imaging and eye tracking (29⇓–31). Here, we use eye tracking because its superior signal-to-noise ratio (relative to brain imaging) makes it more suitable for studying isolated events such as epiphanies. Moreover, recent work has linked eye movements and pupil dilation to key features of SSM processes (32⇓⇓–35).

To use these methods to study EL, we sought a decision setting where subjects make repeated attempts to solve the same problem while receiving feedback. This explicit feedback serves as evidence in the SSM and is important for testing model predictions. We settled on the two-person beauty contest game (2BC), in which each of two players chooses an integer from 0 and 10 with the goal of getting closer to 0.9 multiplied by the average of the two numbers (36). An important feature of this game is that there is an optimal number regardless of what the other player chooses (i.e., a “dominant strategy”). That number is 0. Prior research has shown that most people initially fail to realize that 0 is the optimal strategy (36, 37) but do eventually figure it out. Thus, there is, in most cases, a switch from suboptimal to optimal play that is a characteristic of EL.

Using a specially designed visual presentation of the 2BC game we were able to study how subjects compare and react to the different alternatives and feedback in each trial of the game by studying their eye movements and pupil dilation. To better establish whether subjects truly undergo EL in this setting we introduced an option for subjects to “commit” to a number for the remainder of the study with a small monetary incentive. We reasoned that an epiphany should be accompanied with the certainty that the problem has been solved and does not require further thought. We find distinct patterns of eye movements and pupil dilation for subjects who eventually committed to 0 (the epiphany learners) compared with subjects who never committed or committed to other numbers (Table 1).

View this table:
  • View inline
  • View popup
Table 1.

Summary table for the results

Results

The Task.

In the experiment subjects played 30 trials of the 2BC game. Each trial they were matched with a different opponent, taken from a database of subjects from a prior experiment. Each player picked an integer from 0 to 10 with the goal of picking the number closer to 0.9 times the average of the two players’ numbers (own and opponent’s). Ties were broken with a digital coin flip. Because the average of two numbers is by definition halfway between those numbers, the smaller of the two numbers is always closer to 0.9 times the average. Therefore, picking 0 is the optimal strategy regardless of what the other player chooses, and doing so guarantees at least a tie.

The integers 0 through 10 were displayed in a circular arrangement on the screen (Fig. 1A). Subjects had unlimited time to look around at the numbers and make their decision, which they made using a key press followed by a saccade to the desired number. After their decision, subjects were given the option to commit to that number for the remainder of the experiment (Fig. 1B), in which case no more decisions could be made. On the following screen, subjects received feedback about their opponent’s choice, the target number, and the outcome (win, lose, tie-win, or tie-lose) (Fig. 1C). In the following trial, subjects were matched with a new random trial/opponent from the database. By doing so, we were able to prevent subjects from noticing that their opponents’ choices tended to decrease over time, which might have promoted a mimicking strategy (37).

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Eye-tracking experiment. Text is enlarged and color altered for display purposes. Refer to SI Appendix, Fig. S1 for actual screenshots. (A) Choice screen. Subjects chose an integer from 0 to 10. (B) Commitment screen. Subjects chose whether to commit to their chosen number for the remainder of the study. (C) Feedback screen. Subjects saw their chosen number, their opponent’s chosen number, the target number, the game outcome, and the resulting earnings.

The EL Model.

In general, our EL model states that decision makers aggregate positive/negative evidence for the optimal strategy over time until the relative evidence supporting that strategy is substantially higher than the evidence supporting other strategies. At that point in time the decision maker has an epiphany and begins to use that strategy. Note that there can only be one epiphany, that is, an epiphany for the optimal strategy. The accumulating evidence in the model shifts the focus of the subject toward the optimal choice and, with enough focus, the subject has his epiphany and realizes this choice is the optimal strategy.

Positive evidence supports the optimal strategy, whereas negative evidence does the opposite. Thus, the model is equivalent to a random-walk model (11), and the process can be reduced to a single variable that evolves over time toward one decision barrier. Whether the evidence is positive or negative at a given point in time depends on the subject’s choice and the outcome.

We divide the subjects’ potential actions into two choice sets X0≡{0} and X1≡{1,…,10}. A decision maker chooses x(t) in trial t, and if x(t)∈X0 and she wins/ties the game, or x(t)∈X1 and she loses the game, then she receives positive evidence, in this case supporting the strategy “choose 0.” In all other cases she receives negative evidence.

Mathematically, the net evidence (ev(t)) accumulated up to trial t can be defined as ev(t)=ev(t−1)+d×(−1)I(t), where I(t)=1 if evidence at trial t is negative, 0 otherwise, and ev(0)=0. Here, d is a free parameter in the model that takes on values from 0 to 1. We say that an epiphany occurs at trial t if and only if ev(t)≥1, due to the arbitrary scale of ev(t).

Next, we describe how a decision maker chooses her actions. A decision maker chooses 0 with probability q1 initially, but after an epiphany she starts to choose 0 with probabilityq2. Here, q1 and q2 are also free parameters in the model. All other numbers are chosen with equal probability, that is, (1−q)/10 (see SI Appendix for more details).

Intuitively, our model states that the decision maker receives evidence for a strategy when she chooses that strategy and is successful or chooses a different strategy and is unsuccessful. This evidence accumulates while the decision maker explores different strategies. This process is similar to the experience weighted attraction model (38), where unchosen strategies can still be reinforced. Once the decision maker has accumulated sufficient evidence, she has an epiphany and alters her strategy (for alternative model specifications see SI Appendix).

Identifying Epiphanies from Behavior.

In prior work, epiphanies have been characterized as “the point in the data where the (choice) distributions are statistically different before and after a change point” (10). Specifically, in ref. 10, a Kolmogorov–Smirnov (K-S) test was used to identify this change point. As a first pass, we applied the same analysis to identify epiphany learners in our experiment, using a P-value cutoff of 5% for the significance of the change point. Additionally, we compared model fits for our EL model and an alternative RL model (Materials and Methods). Subjects who were better fit by the EL model were classified as epiphany learners (SI Appendix, Table S2).

Depending on the method used, we estimate the proportion of epiphany learners as either 53% (K-S) or 66% (model comparison). Conditional on being identified as an epiphany learner by the K-S (model comparison) method, the probability that the same subject was also identified as an epiphany learner by the other method was 94% (74%). Of course, not all subjects discovered the optimal strategy during the experiment and so we would not expect to see definitive evidence for EL in those cases. Conditioning on the subjects whose choices did eventually converge to 0, we find that the K-S and model comparison tests both classify 94% of these subjects as epiphany learners. Here, conditional on being identified as an epiphany learner by either method, the probability that the same subject was also identified as an epiphany learner by the other method was 94% as well.

We included the commitment decision as an additional way to detect epiphanies. We reasoned that if a subject has an epiphany and discovers the optimal strategy she should be willing to commit to that strategy for a small cash bonus [one experimental currency unit (ECU) per trial, regardless of the outcome]. Although a vast majority of subjects who eventually converged to 0 in their choices did commit to 0 (42% overall, 76% of the 56% whose choices converged to 0), there were also many subjects who committed to other numbers (37% overall). The rest of the subjects never committed (20%) (Fig. 2A). Although we were not expecting subjects to commit to numbers above 0, these subjects turned out to be useful in our later analyses.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Commitment behavior. (A) Histogram of subject types based on their commitment and choice behavior. Here, the “Choices do not Converge” and “Choices Converge to 0” groups include only subjects who never committed. (B) Histogram of the numbers that subjects committed to. (C) Histogram of the trial in which subjects committed, conditional on commitment type.

Subjects who committed to numbers greater than 0 were surprisingly uniform in their commit numbers (P = 0.13, K-S test of uniform distribution) (Fig. 2B). Also, subjects who committed to numbers greater than 0 did so significantly earlier than subjects who committed to 0 (P = 0.004, Mann–Whitney test) (Fig. 2C), and they also won with numbers greater than 0 significantly more often than subjects who committed to 0 (P<0.001, t test).

Conditioning on the commit-to-0 group (n = 25), the K-S and model-comparison tests classified 24 and 23 subjects as epiphany learners, respectively. However, conditioning on the commit >0 group (n = 22), the K-S and model-comparison tests classified 0 and 5 subjects as epiphany learners, respectively (for these subjects neither model fit well). In other words, we found strong evidence for EL in the commit-to-0 group but not in the other subjects who chose to commit.

As described above, what distinguishes EL from standard RL is a sudden shift in behavior to a steady strategy (Fig. 3). This behavior is typically overlooked when examining aggregate behavior because different subjects have epiphanies at different points in time and so averaging over them one is left with a picture of a gradual increase in optimal behavior. This phenomenon underscores the importance of examining individual-level behavior (39⇓–41).

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Comparing EL and RL. Red and green lines are best-fit curves from the EL and RL model, respectively. (A) Aggregated data (n = 59). Learning seems gradual and is similarly well-fit by EL and RL. (B and C) Individual data from two representative subjects (subjects 10 and 16, one per panel). Learning happens all at once, as seen in a sudden shift in behavior, which the RL model cannot capture. The vertical black line indicates the commit trial. Note that subjects need not commit as soon as they have an epiphany; for instance, they may choose to confirm their epiphany with some feedback before commitment. See B for an example.

One might wonder whether a sudden choice of, and then commitment to, a previously unchosen alternative is sufficient to identify EL. In fact, this is not the case. Commit-to-0 subjects (the ones primarily identified as epiphany learners) were significantly increasing in their likelihood of choosing the commit number as they approached the commitment trial (P<0.001), whereas commit >0 subjects (generally not classified as epiphany learners) displayed a trend that was significantly smaller than the commit-to-0 group (P = 0.023) and insignificantly different from 0 (P = 0.96) (SI Appendix, Table S3). Also, overall, commit >0 subjects were significantly less likely to choose the number that they eventually committed to (P<0.001). These results indicate that it is possible to behaviorally anticipate the commitment decision in the epiphany learners, but not in the others. Next we examine why this result might be, using the EL model and evidence of the choice process.

Understanding the EL Process.

After demonstrating the existence of epiphanies, it is natural to ask how they occur. In our paradigm, one complication comes from trying to distinguish processes associated with epiphanies from processes associated with deciding to commit. Thankfully, the commit >0 subjects serve as an appropriate control group, because these subjects decided to commit without having the correct epiphany.

In the model, a subject aggregates evidence (ev) for choosing 0, and once the accumulated evidence crosses a threshold the subject has an epiphany. Prior eye-tracking research has indicated that subjects tend to allocate more attention (in terms of fixations) to alternatives relevant to their decision (16, 32, 42). We thus hypothesized that commit-to-0 subjects would spend more time looking at 0 if the aggregated evidence at trial (t) was relatively high. This was indeed the case for commit-to-0 subjects (P<0.001, regressions 1 and 2 in SI Appendix, Table S4), but this trend was significantly smaller in commit >0 subjects (P<0.001) and insignificantly different from zero (P>0.1). We observed the same effect after controlling for the total number of wins up to trial t (regressions 3 and 4), and the action values in the alternative RL model (regressions 5 and 6).

We next sought to test whether gaze patterns could help us anticipate subjects’ epiphanies. If, as suggested by the analysis above, the eye-tracking data reflect the subjective evidence favoring the optimal choice, we should be able to use that data to predict the timing of epiphanies. To do so we focused on the first three trials of the experiment and asked how often a subject refixated on an option at least once in each of those trials (zero to three times). We reasoned that trials with refixations are indicative of there being evidence favoring certain options, whereas trials without refixations are more likely associated with an exploratory search process (43, 44). We found that indeed the probability of refixating in these first three trials was correlated with the timing of commitment, for the commit-to-0 group (P = 0.006, SI Appendix, Fig. S2, Left) but not for the commit >0 group (P = 0.28, SI Appendix, Fig. S2, Right). This result is robust to the number of early trials we pick (from two to five).

A second important feature of the EL model is the sudden jump in both behavior and awareness of that option’s being the optimal strategy. The eye-tracking data corroborate this hypothesis. Looking at the commitment screen we observed that both types of subjects spent significantly more time looking at the “yes” button on the commit trial (relative to earlier trials, P<0.001, Fig. 4 and SI Appendix, Table S5), but this effect was significantly stronger for the commit-to-0 subjects (by 14%, P = 0.04).

Fig. 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 4.

Commitment screen results. Subjects’ relative dwell times on the yes and no buttons as they approached the commit trial. Bars indicate SEM, clustered by subject. The left-shifted bars belong to the commit >0 group.

Importantly, the dwell time on the yes button, for the commit >0 subjects, increased in the trials leading up to the commit trial (P = 0.04) and was correlated with choosing the committed-to number in earlier trials (P = 0.06), but this trend was not the case for commit-to-0 subjects (P = 0.63 and P = 0.84, respectively). Thus, the gaze data indeed indicate a sudden, rather than gradual, decision to commit, particularly for the commit-to-0 subjects.

The EL model also assumes that subjects learn by accumulating evidence for winning numbers and against losing numbers based on their own choices. This assumption suggests that during the feedback screen epiphany learners should preferentially attend to the game outcome rather than to their opponents’ decision, whereas nonepiphany learners might do the opposite, which was indeed the case (regressions 1 and 3 in SI Appendix, Table S6). Commit-to-0 subjects did not look significantly less or more at their opponents’ decision, either overall (P = 0.184) or approaching the commit trial (P = 0.136). However, they did look significantly more at the results of the game as they approached the commit trial (P = 0.003). In comparison, as they approached the commit trial, commit >0 subjects spent significantly more time looking at their opponents’ decision (P =0.026), and significantly less time at the outcome (P = 0.028), relative to commit-to-0 subjects. Moreover, only commit >0 subjects looked more at their opponents’ decisions when their opponents chose numbers that they themselves later committed to (P = 0.004, regression 2 in SI Appendix, Table S6). Thus, the two groups of subjects can be clearly differentiated by their patterns of attention during feedback in a way that is consistent with the predictions of the EL model. Finally, only commit-to-0 subjects looked more at the game result after choosing their commit number (P<0.001), as if testing a hypothesis about 0. Commit >0 showed significantly less of such effect (P<0.001).

Finally, we tested two additional model hypotheses using the pupil dilation data. First, prior work has demonstrated a link between pupil dilation and prediction errors (or surprise) in learning tasks (29, 45, 46). If subjects are indeed learning, we might expect their pupil dilation to reflect prediction errors, as defined by the absolute difference between the subject’s choice and the target number. The commit-to-0 subjects’ pupil dilation during the feedback screen (before the commit trial) was positively correlated with this measure of prediction error (P =0.001), but that correlation was significantly reversed (P<0.001) for trials after commitment (regression 1 in SI Appendix, Table S7). As for the commit >0 subjects, the same correlation was only marginally positive (P = 0.056) before the commit trial and also significantly reversed (P = 0.001) after the commit trial (regression 2 in SI Appendix, Table S7). Although this result was not an explicit prediction of the EL model, it does suggest that commit >0 subjects may not have been learning during the experiment, consistent with their suboptimal behavior. Moreover, for trials after the commit trial, this relationship reversed, suggesting that subjects were no longer learning.

Moreover, looking at the feedback screen in the trial immediately before commitment, we found that for the commit > 0 group, subjects who lost showed significantly more pupil dilation than subjects who won (between 500 and 2,500 ms, P<0.05, Fig. 5, Right), whereas the commit-to-0 subjects showed no such difference (P>0.05, Fig. 5, Left). In addition, looking at all earlier trials, we found that only commit-to-0 subjects showed the same significant difference, again indicating that these subjects show clear signs of learning before commitment (SI Appendix, Fig. S3, Left).

Fig. 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 5.

Pupillary responses. Change in pupil diameter on the feedback screen for wins and losses immediately before the commit trial, for subjects who (Left) committed to 0 and (Right) committed to other numbers. Red horizontal bars indicate significant differences between wins and losses at P<0.05.

Discussion

In this paper we proposed and tested a model we call EL. Unlike RL, which is characterized by gradual behavioral shifts, EL features sudden and permanent shifts in behavior. Although sudden shifts in behavior have been previously documented, here we proposed a mechanism for such epiphanies and tested it using process data. In particular, we have proposed that epiphanies are the result of an evidence accumulation process that approaches an epiphany threshold as optimal choices are rewarded or suboptimal choices are not.

To test this model, we used a variant of a well-known task from behavioral game theory: the beauty contest. In particular, the two-person version of the beauty contest has a dominant strategy of choosing 0 [although a recent paper has argued that subjects may be confusing this game with an alternative “distance payoff” game, where 0 is not a dominant strategy (37)]. Although many of our subjects discovered this dominant strategy, nearly half of them did not, even after many repetitions of the same game (up to 30). Furthermore, nearly as many subjects chose to commit to suboptimal strategies (n = 22) as chose to commit to 0 (n =25). Comparing the behavioral fits of the EL and RL models, we found that commit-to-0 subjects were far better fit by the EL model, whereas the rest were better fit by the RL model.

The advantage of a process model is that it can be tested with process data. We used eye-tracking data to confirm various assumptions of the model, namely that epiphany learners increasingly focus on the optimal strategy during the decision stage, pay more attention to outcomes than to their opponents’ decisions during feedback, and exhibit signs of learning in their pupillary response to the game outcomes. Our results suggest several ways in which it may be possible to distinguish legitimate epiphanies from “false” epiphanies. It will be interesting in future work to explore applications of these findings. For instance, in economics there are “principal-agent problems” where the principal has to rely on an agent of unknown expertise (imagine a clueless subject in our experiment hiring someone to make the decision for him). By assumption the principal does not know the optimal strategy and so he cannot tell whether the agent is informed or not. Our findings indicate that the principal may be able to use patterns in the agent’s choices, eye movements, and pupil dilation to detect the true experts, similar to how an inability to make eye contact can be used to detect autism (47). These results may also be of interest in education, where it would be useful to be able to quickly detect when students have truly understood, as opposed to simply nodding their heads and saying that they have understood.

Of course, it remains to be seen how well the model will generalize to other settings. Although this is an empirical question that we cannot answer directly here, previous papers (9, 10) have demonstrated epiphany-like learning in the game of 21 and Nim. Thus, it is reasonable to believe that our model will be applicable in other contexts.

Moreover, it is also clear that not all learning can be classified as EL. It remains to be seen what makes different problems more or less amenable to EL, although one obvious criterion for EL is the presence of an optimal strategy.

Finally, although the RL model we have presented is a simple version of this type of model, there are certainly more sophisticated RL models in the literature (48⇓⇓–51). However, these RL models share a fundamental feature, which is that the predicted choice probabilities gradually change over time. Although learning rates can be adjusted to capture arbitrarily large jumps in behavior, those same learning rates incorrectly predict equally large shifts in behavior before the epiphany. Thus, RL models cannot seem to capture EL without additional assumptions.

Materials and Methods

Subjects.

Fifty-nine undergraduates from The Ohio State University participated in the experiment. Subjects gave informed written consent before receiving the experimental instructions. Subjects were paid a $5 show-up fee, in addition to receiving 40 cents (10 ECUs) for each win in the experiment. The Ohio State University’s Human Subjects Internal Review Board approved the experiment.

Task.

Choices for the database.

We recruited an initial group of subjects (n = 28) to play the 2BC game against each other for 30 trials, with random rematching. This experiment was run in one session in a behavioral lab using zTree (52), and no eye-tracking was involved (SI Appendix, Fig. S5). Otherwise, the game itself was the same as in the eye-tracking experiment (discussed below). Briefly, in each trial of the experiment subjects chose an integer from 0 to 10 and were then asked whether they would like to commit to that number for the remainder of the experiment. They were then paired up, learned their opponent’s decision, the target number, and the game outcome, and then proceeded to the next trial. These choices were entered into a database that subjects in the eye-tracking experiment later played against.

The eye-tracking experiment.

Subjects went through two training phases (no payment) before the main experiment. In all three cases, the choice screen (Fig. 1A) and the choice mechanism were the same. What varied across the three phases were the task, feedback, and whether there was a commitment screen after they made the choice.

The choice mechanism used a combination of eye movements and the keyboard. Subjects had as long as they wanted to make a choice. Once they were ready to make a choice, they had to first fixate on the center of the screen and press the space bar, then fixate on their choice (the corresponding circle diameter increased by 10%), and then press the space bar again within 2.5 s. We used this procedure to ensure that subjects’ eye movements during the choice process were separate from the process of physically selecting the chosen number.

In the first training phase, subjects received a random number from 0 to 10 and their task was to then select that number on the choice screen. This training phase ensured that subjects understood how to select their desired numbers. After five consecutive successes, subjects moved on to the second training phase.

In the second training phase, subjects played the N-person beauty contest (n=29) game for one trial, with the database (which also included one round of the N-person beauty contest). The purpose of this training phase was to ensure that subjects understood the calculation of the target number and how the winner was decided. The feedback included the target number, and whether the subject had won.

In the main experiment, subjects played the 2BC game against the same database for 30 trials. In each trial, the opponent’s decision was a random subject’s decision from a random trial in the first 20 trials of the database. We opted to use this database method for practical reasons (only one eye tracker), to avoid any possibility that subjects might try to influence their opponents, and to reduce the likelihood that subjects might learn to play 0 from simply observing their opponents’ learning process. Starting in the second trial, after the choice screen, subjects were taken to the commitment screen (Fig. 1B), which contained the choice they had just made, “yes” and “no” buttons, and a written warning that a “yes” decision would mean that they would no longer be able to make decisions in the experiment. Subjects had as long as they wanted to make a choice on this screen. When ready, subjects had to fixate on the yes or no button (the corresponding button became 10% larger) and press the space bar to make their choice. In the trials following a commitment, the choice and commitment screens were replaced by a 10-s countdown screen.

At the end of every trial, subjects saw feedback for that trial. The feedback included the subject’s own decision, the opponent’s decision, the target number, and the game result (Fig. 1C). Subjects received 10 ECUs for a win, 0 for a loss, and in the case of a tie a digital coin-flip determined the result. Subjects also received 1 ECU each trial postcommitment, regardless of the game result.

RL Model.

As an alternative to our EL model, we also use an RL model from previous research (53). As in the EL model, we group choices into X0 and X1 for comparison.

At the beginning of the experiment (t=0), the subject has prior attractions Aj(0) for Xj, j∈{0,1}, where A0(0) is normalized to be 0, and Aj(t) is updated based on the following equation:Aj(t)=ϕAj(t−1)+1(x(t)∈Xj)π(t),where ϕ∈(0,1] is the discount factor for the attraction, π(t) is the payoff received in trial t, 1(⋅) is the indicator function, and the predicted choice probability of choosing 0 in trial t is given byp^0(t+1)=eλA0(t)eλA0(t)+eλA1(t).As with the EL model, all non-0 numbers are chosen with equal probability. Note that in this model three parameters (A1(0), ϕ, λ) need to be estimated.

Acknowledgments

This work was supported by NSF Career Grant 1554837 (to I.K.).

Footnotes

  • ↵1W.J.C. and I.K. contributed equally to this work.

  • ↵2To whom correspondence should be addressed. Email: krajbich.1{at}osu.edu.
  • Author contributions: W.J.C. and I.K. designed research; W.J.C. and I.K. performed research; W.J.C. contributed new reagents/analytic tools; W.J.C. analyzed data; and W.J.C. and I.K. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1618161114/-/DCSupplemental.

References

  1. ↵
    1. Erev I,
    2. Roth AE
    (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88:848–881.
    .
    OpenUrl
  2. ↵
    1. Sutton RS,
    2. Barto AG
    (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA), Vol 1.
    .
  3. ↵
    1. Dayan P,
    2. Balleine BW
    (2002) Reward, motivation, and reinforcement learning. Neuron 36:285–298.
    .
    OpenUrlCrossRefPubMed
  4. ↵
    1. Wallas G
    (1926) The Art of Thought (Solis, London).
    .
  5. ↵
    1. Bower GH
    (1961) Application of a model to paired-associate learning. Psychometrika 26:255–280.
    .
    OpenUrlCrossRef
  6. ↵
    1. Metcalfe J,
    2. Wiebe D
    (1987) Intuition in insight and noninsight problem solving. Mem Cognit 15:238–246.
    .
    OpenUrlCrossRefPubMed
  7. ↵
    1. Dodds R,
    2. Ward T,
    3. Smith S
    (2003) A Review of the Experimental Literature on Incubation in Problem Solving and Creativity (Hampton, Cresskill, NJ), Vol 3.
    .
  8. ↵
    1. Hélie S,
    2. Sun R
    (2010) Incubation, insight, and creative problem solving: A unified theory and a connectionist model. Psychol Rev 117:994–1024.
    .
    OpenUrlCrossRefPubMed
  9. ↵
    1. Dufwenberg M,
    2. Sundaram R,
    3. Butler DJ
    (2010) Epiphany in the game of 21. J Econ Behav Organ 75:132–143.
    .
    OpenUrl
  10. ↵
    1. McKinney CN,
    2. Van Huyck JB
    (2013) Eureka learning: Heuristics and response time in perfect information games. Games Econ Behav 79:223–232.
    .
    OpenUrl
  11. ↵
    1. Ratcliff R,
    2. Smith PL
    (2004) A comparison of sequential sampling models for two-choice reaction time. Psychol Rev 111:333–367.
    .
    OpenUrlCrossRefPubMed
  12. ↵
    1. Bogacz R,
    2. Brown E,
    3. Moehlis J,
    4. Holmes P,
    5. Cohen JD
    (2006) The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev 113:700–765.
    .
    OpenUrlCrossRefPubMed
  13. ↵
    1. Camerer CF,
    2. Johnson EJ
    (2004) Thinking about attention in games: Backward and forward induction. Psychol Econ Decis 2:111–129.
    .
    OpenUrl
  14. ↵
    1. Krajbich I,
    2. Armel C,
    3. Rangel A
    (2010) Visual fixations and the computation and comparison of value in simple choice. Nat Neurosci 13:1292–1298.
    .
    OpenUrlCrossRefPubMed
  15. ↵
    1. Wang JTy,
    2. Spezio M,
    3. Camerer CF
    (2010) Pinocchio’s pupil: Using eyetracking and pupil dilation to understand truth telling and deception in sender-receiver games. Am Econ Rev 100:984–1007.
    .
    OpenUrlCrossRef
  16. ↵
    1. Krajbich I,
    2. Rangel A
    (2011) Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc Natl Acad Sci USA 108:13852–13857.
    .
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Ashby NJ,
    2. Dickert S,
    3. Glöckner A
    (2012) Focusing on what you own: Biased information uptake due to ownership. Judgm Decis Mak 7:254–267.
    .
    OpenUrl
  18. ↵
    1. Fiedler S,
    2. Glöckner A,
    3. Nicklisch A,
    4. Dickert S
    (2013) Social value orientation and information search in social dilemmas: An eye-tracking analysis. Organ Behav Hum Decis Process 120:272–284.
    .
    OpenUrlCrossRef
  19. ↵
    1. Polonio L,
    2. Di Guida S,
    3. Coricelli G
    (2015) Strategic sophistication and attention in games: An eye-tracking study. Games Econ Behav 94:80–96.
    .
    OpenUrl
  20. ↵
    1. Stewart N,
    2. Gächter S,
    3. Noguchi T,
    4. Mullett TL
    (2016) Eye movements in strategic choice. J Behav Decis Mak 29:137–156.
    .
    OpenUrl
  21. ↵
    1. Stone M
    (1960) Models for choice-reaction time. Psychometrika 25:251–260.
    .
    OpenUrlCrossRef
  22. ↵
    1. Ratcliff R
    (1978) A theory of memory retrieval. Psychol Rev 85:59–108.
    .
    OpenUrlCrossRef
  23. ↵
    1. Luce RD
    (1986) Response Times: Their Role in Inferring Elementary Mental Organization. Oxford Psychology Series (Oxford Univ Press, Oxford), No. 8.
    .
  24. ↵
    1. Gold JI,
    2. Shadlen MN
    (2007) The neural basis of decision making. Annu Rev Neurosci 30:535–574.
    .
    OpenUrlCrossRefPubMed
  25. ↵
    1. Ratcliff R,
    2. McKoon G
    (2008) The diffusion decision model: Theory and data for two-choice decision tasks. Neural Comput 20:873–922.
    .
    OpenUrlCrossRefPubMed
  26. ↵
    1. De Martino B,
    2. Fleming SM,
    3. Garrett N,
    4. Dolan RJ
    (2013) Confidence in value-based choice. Nat Neurosci 16:105–110.
    .
    OpenUrlCrossRefPubMed
  27. ↵
    1. Woodford M
    (2014) Stochastic choice: An optimizing neuroeconomic model. Am Econ Rev 104:495–500.
    .
    OpenUrl
  28. ↵
    1. Krajbich I,
    2. Dean M
    (2015) How can neuroscience inform economics? Curr Opin Behav Sci 5:51–57.
    .
    OpenUrl
  29. ↵
    1. Nassar MR, et al.
    (2012) Rational regulation of learning dynamics by pupil-linked arousal systems. Nat Neurosci 15:1040–1046.
    .
    OpenUrlCrossRefPubMed
  30. ↵
    1. Glimcher PW,
    2. Fehr E
    (2013) Neuroeconomics: Decision Making and the Brain (Academic, New York).
    .
  31. ↵
    1. Eldar E,
    2. Cohen JD,
    3. Niv Y
    (2013) The effects of neural gain on attention and learning. Nat Neurosci 16:1146–1153.
    .
    OpenUrlCrossRefPubMed
  32. ↵
    1. Towal RB,
    2. Mormann M,
    3. Koch C
    (2013) Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proc Natl Acad Sci USA 110:E3858–E3867.
    .
    OpenUrlAbstract/FREE Full Text
  33. ↵
    1. Cheadle S, et al.
    (2014) Adaptive gain control during human perceptual choice. Neuron 81:1429–1441.
    .
    OpenUrlCrossRefPubMed
  34. ↵
    1. Cavanagh JF,
    2. Wiecki TV,
    3. Kochar A,
    4. Frank MJ
    (2014) Eye tracking and pupillometry are indicators of dissociable latent decision processes. J Exp Psychol Gen 143:1476–1488.
    .
    OpenUrlCrossRefPubMed
  35. ↵
    1. Konovalov A,
    2. Krajbich I
    (2016) Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning. Nat Commun 7:12438.
    .
    OpenUrl
  36. ↵
    1. Grosskopf B,
    2. Nagel R
    (2008) The two-person beauty contest. Games Econ Behav 62:93–99.
    .
    OpenUrl
  37. ↵
    1. Nagel R,
    2. Bühren C,
    3. Frank B
    (2016) Inspired and inspiring: Hervé Moulin and the discovery of the beauty contest game. Math Soc Sci, in press.
    .
  38. ↵
    1. Camerer C,
    2. Ho TH
    (1999) Experience-weighted attraction learning in normal form games. Econometrica 67:827–874.
    .
    OpenUrlCrossRefPubMed
  39. ↵
    1. Estes WK
    (1956) The problem of inference from curves based on group data. Psychol Bull 53:134–140.
    .
    OpenUrlCrossRefPubMed
  40. ↵
    1. Glautier S
    (2013) Revisiting the learning curve (once again). Front Psychol 4:982.
    .
    OpenUrl
  41. ↵
    1. Murre JM
    (2014) S-shaped learning curves. Psychon Bull Rev 21(2):344–356.
    .
    OpenUrl
  42. ↵
    1. Reutskaja E,
    2. Nagel R,
    3. Camerer CF,
    4. Rangel A
    (2011) Search dynamics in consumer choice under time pressure: An eye-tracking study. Am Econ Rev 101(2):900–926.
    .
    OpenUrlCrossRef
  43. ↵
    1. Janiszewski C
    (1998) The influence of display characteristics on visual exploratory search behavior. J Consum Res 25(3):290–301.
    .
    OpenUrlAbstract/FREE Full Text
  44. ↵
    1. Pfeiffer J, et al.
    (2014) On the influence of context-based complexity on information search patterns: An individual perspective. J Neurosci Psychol Econ 7(2):103–124.
    .
    OpenUrl
  45. ↵
    1. Preuschoff K,
    2. Marius’t Hart B,
    3. Einhäuser W
    (2011) Pupil dilation signals surprise: Evidence for noradrenaline’s role in decision making. Front Neurosci 5:115.
    .
    OpenUrlCrossRefPubMed
  46. ↵
    1. O’Reilly JX, et al.
    (2013) Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc Natl Acad Sci USA 110:E3660–E3669.
    .
    OpenUrlAbstract/FREE Full Text
  47. ↵
    1. Spezio ML,
    2. Adolphs R,
    3. Hurley RS,
    4. Piven J
    (2007) Analysis of face gaze in autism using “bubbles” Neuropsychologia 45:144–151.
    .
    OpenUrlCrossRefPubMed
  48. ↵
    1. Börgers T,
    2. Sarin R
    (2000) Naive reinforcement learning with endogenous aspirations. Int Econ Rev 41:921–950.
    .
    OpenUrl
  49. ↵
    1. Arifovic J,
    2. McKelvey RD,
    3. Pevnitskaya S
    (2006) An initial implementation of the turing tournament to learning in repeated two-person games. Games Econ Behav 57:93–122.
    .
    OpenUrl
  50. ↵
    1. Biele G,
    2. Erev I,
    3. Ert E
    (2009) Learning, risk attitude and hot stoves in restless bandit problems. J Math Psychol 53:155–167.
    .
    OpenUrlCrossRef
  51. ↵
    1. Chen W,
    2. Liu SY,
    3. Chen CH,
    4. Lee YS
    (2011) Bounded memory, inertia, sampling and weighting model for market entry games. Games 2:187–199.
    .
    OpenUrlCrossRef
  52. ↵
    1. Fischbacher U
    (2007) z-tree: Zurich toolbox for ready-made economic experiments. Exp Econ 10:171–178.
    .
    OpenUrlCrossRef
  53. ↵
    1. Salmon TC
    (2001) An evaluation of econometric models of adaptive learning. Econometrica 69:1597–1628.
    .
    OpenUrlCrossRef
Next
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Computational modeling of epiphany learning
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Epiphany learning
Wei James Chen, Ian Krajbich
Proceedings of the National Academy of Sciences Apr 2017, 201618161; DOI: 10.1073/pnas.1618161114

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Epiphany learning
Wei James Chen, Ian Krajbich
Proceedings of the National Academy of Sciences Apr 2017, 201618161; DOI: 10.1073/pnas.1618161114
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 118 (15)
Current Issue

Submit

Sign up for Article Alerts

Jump to section

  • Article
    • Abstract
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Smoke emanates from Japan’s Fukushima nuclear power plant a few days after tsunami damage
Core Concept: Muography offers a new way to see inside a multitude of objects
Muons penetrate much further than X-rays, they do essentially zero damage, and they are provided for free by the cosmos.
Image credit: Science Source/Digital Globe.
Water from a faucet fills a glass.
News Feature: How “forever chemicals” might impair the immune system
Researchers are exploring whether these ubiquitous fluorinated molecules might worsen infections or hamper vaccine effectiveness.
Image credit: Shutterstock/Dmitry Naumov.
Venus flytrap captures a fly.
Journal Club: Venus flytrap mechanism could shed light on how plants sense touch
One protein seems to play a key role in touch sensitivity for flytraps and other meat-eating plants.
Image credit: Shutterstock/Kuttelvaserova Stuchelova.
Illustration of groups of people chatting
Exploring the length of human conversations
Adam Mastroianni and Daniel Gilbert explore why conversations almost never end when people want them to.
Listen
Past PodcastsSubscribe
Panda bear hanging in a tree
How horse manure helps giant pandas tolerate cold
A study finds that giant pandas roll in horse manure to increase their cold tolerance.
Image credit: Fuwen Wei.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Subscribers
  • Librarians
  • Press
  • Cozzarelli Prize
  • Site Map
  • PNAS Updates
  • FAQs
  • Accessibility Statement
  • Rights & Permissions
  • About
  • Contact

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490