Computational modeling of epiphany learning
See allHide authors and affiliations
Edited by Alvin E. Roth, Stanford University, Palo Alto, CA, and approved February 23, 2017 (received for review November 1, 2016)

Significance
Epiphany learning (EL) occurs when organisms seem to suddenly and dramatically alter their behavior. Although EL has been documented before, this paper proposes a model of EL and reports both subjects’ behavioral and eye-tracking data while they repeatedly played a game that has an optimal strategy. We find clear behavioral evidence for EL in this game and additionally find patterns of eye movements and pupil dilation that correspond to features of the model. In the process, we also identify distinct patterns for subjects who did not discover the optimal strategy. Our model and findings provide a framework for future research on EL.
Abstract
Models of reinforcement learning (RL) are prevalent in the decision-making literature, but not all behavior seems to conform to the gradual convergence that is a central feature of RL. In some cases learning seems to happen all at once. Limited prior research on these “epiphanies” has shown evidence of sudden changes in behavior, but it remains unclear how such epiphanies occur. We propose a sequential-sampling model of epiphany learning (EL) and test it using an eye-tracking experiment. In the experiment, subjects repeatedly play a strategic game that has an optimal strategy. Subjects can learn over time from feedback but are also allowed to commit to a strategy at any time, eliminating all other options and opportunities to learn. We find that the EL model is consistent with the choices, eye movements, and pupillary responses of subjects who commit to the optimal strategy (correct epiphany) but not always of those who commit to a suboptimal strategy or who do not commit at all. Our findings suggest that EL is driven by a latent evidence accumulation process that can be revealed with eye-tracking data.
How organisms learn is a central question in the behavioral sciences. Much of the literature has focused on reinforcement learning (RL) (1⇓–3), where organisms gradually adjust their behavior in response to the outcomes of prior actions. However, there are other situations where organisms seem to suddenly and dramatically alter their behavior (4⇓⇓–7), often without any external input (8). This “epiphany” learning (EL) is characterized by an unexpected moment of insight, often portrayed in cartoons by a light bulb appearing over a person’s head, and was first introduced to economics in ref. 9.
The sudden, unexpected, and irreversible nature of EL makes it inherently difficult to predict or to study. Unlike with standard RL, where the decision maker’s choices provide insight into the underlying mechanism, here decisions provide little to no insight into the mechanism underlying the generation of epiphanies, other than to establish that they have occurred ex post (9, 10). An understanding of the EL process thus remains elusive.
Here, we sought to tackle this challenging problem with a combination of computational modeling and measures of the choice process. In particular, we use a model from the class of sequential-sampling models (SSMs) (11, 12) to capture the unconscious accumulation of evidence toward the epiphany and eye-tracking technology to study this process (13⇓⇓⇓⇓⇓⇓–20).
SSMs have received much attention in the decision sciences for their ability to capture choice outcomes, response times (21⇓–23), and, more recently, eye-tracking and neural activity in perceptual and value-based decision making (24⇓⇓⇓–28). These models assume that at the time of choice the decision maker evaluates the available options and accumulates evidence for each one until there is sufficient relative evidence for one option over the others. Once the relative evidence reaches this predetermined threshold, the decision maker implements the corresponding decision. SSMs are a natural way to capture EL because they share the feature of a sudden change in behavior due to latent processes.
Research in decision science has highlighted the usefulness of studying such latent processes using techniques such as brain imaging and eye tracking (29⇓–31). Here, we use eye tracking because its superior signal-to-noise ratio (relative to brain imaging) makes it more suitable for studying isolated events such as epiphanies. Moreover, recent work has linked eye movements and pupil dilation to key features of SSM processes (32⇓⇓–35).
To use these methods to study EL, we sought a decision setting where subjects make repeated attempts to solve the same problem while receiving feedback. This explicit feedback serves as evidence in the SSM and is important for testing model predictions. We settled on the two-person beauty contest game (2BC), in which each of two players chooses an integer from 0 and 10 with the goal of getting closer to 0.9 multiplied by the average of the two numbers (36). An important feature of this game is that there is an optimal number regardless of what the other player chooses (i.e., a “dominant strategy”). That number is 0. Prior research has shown that most people initially fail to realize that 0 is the optimal strategy (36, 37) but do eventually figure it out. Thus, there is, in most cases, a switch from suboptimal to optimal play that is a characteristic of EL.
Using a specially designed visual presentation of the 2BC game we were able to study how subjects compare and react to the different alternatives and feedback in each trial of the game by studying their eye movements and pupil dilation. To better establish whether subjects truly undergo EL in this setting we introduced an option for subjects to “commit” to a number for the remainder of the study with a small monetary incentive. We reasoned that an epiphany should be accompanied with the certainty that the problem has been solved and does not require further thought. We find distinct patterns of eye movements and pupil dilation for subjects who eventually committed to 0 (the epiphany learners) compared with subjects who never committed or committed to other numbers (Table 1).
Summary table for the results
Results
The Task.
In the experiment subjects played 30 trials of the 2BC game. Each trial they were matched with a different opponent, taken from a database of subjects from a prior experiment. Each player picked an integer from 0 to 10 with the goal of picking the number closer to 0.9 times the average of the two players’ numbers (own and opponent’s). Ties were broken with a digital coin flip. Because the average of two numbers is by definition halfway between those numbers, the smaller of the two numbers is always closer to 0.9 times the average. Therefore, picking 0 is the optimal strategy regardless of what the other player chooses, and doing so guarantees at least a tie.
The integers 0 through 10 were displayed in a circular arrangement on the screen (Fig. 1A). Subjects had unlimited time to look around at the numbers and make their decision, which they made using a key press followed by a saccade to the desired number. After their decision, subjects were given the option to commit to that number for the remainder of the experiment (Fig. 1B), in which case no more decisions could be made. On the following screen, subjects received feedback about their opponent’s choice, the target number, and the outcome (win, lose, tie-win, or tie-lose) (Fig. 1C). In the following trial, subjects were matched with a new random trial/opponent from the database. By doing so, we were able to prevent subjects from noticing that their opponents’ choices tended to decrease over time, which might have promoted a mimicking strategy (37).
Eye-tracking experiment. Text is enlarged and color altered for display purposes. Refer to SI Appendix, Fig. S1 for actual screenshots. (A) Choice screen. Subjects chose an integer from 0 to 10. (B) Commitment screen. Subjects chose whether to commit to their chosen number for the remainder of the study. (C) Feedback screen. Subjects saw their chosen number, their opponent’s chosen number, the target number, the game outcome, and the resulting earnings.
The EL Model.
In general, our EL model states that decision makers aggregate positive/negative evidence for the optimal strategy over time until the relative evidence supporting that strategy is substantially higher than the evidence supporting other strategies. At that point in time the decision maker has an epiphany and begins to use that strategy. Note that there can only be one epiphany, that is, an epiphany for the optimal strategy. The accumulating evidence in the model shifts the focus of the subject toward the optimal choice and, with enough focus, the subject has his epiphany and realizes this choice is the optimal strategy.
Positive evidence supports the optimal strategy, whereas negative evidence does the opposite. Thus, the model is equivalent to a random-walk model (11), and the process can be reduced to a single variable that evolves over time toward one decision barrier. Whether the evidence is positive or negative at a given point in time depends on the subject’s choice and the outcome.
We divide the subjects’ potential actions into two choice sets
Mathematically, the net evidence (
Next, we describe how a decision maker chooses her actions. A decision maker chooses 0 with probability
Intuitively, our model states that the decision maker receives evidence for a strategy when she chooses that strategy and is successful or chooses a different strategy and is unsuccessful. This evidence accumulates while the decision maker explores different strategies. This process is similar to the experience weighted attraction model (38), where unchosen strategies can still be reinforced. Once the decision maker has accumulated sufficient evidence, she has an epiphany and alters her strategy (for alternative model specifications see SI Appendix).
Identifying Epiphanies from Behavior.
In prior work, epiphanies have been characterized as “the point in the data where the (choice) distributions are statistically different before and after a change point” (10). Specifically, in ref. 10, a Kolmogorov–Smirnov (K-S) test was used to identify this change point. As a first pass, we applied the same analysis to identify epiphany learners in our experiment, using a
Depending on the method used, we estimate the proportion of epiphany learners as either 53% (K-S) or 66% (model comparison). Conditional on being identified as an epiphany learner by the K-S (model comparison) method, the probability that the same subject was also identified as an epiphany learner by the other method was 94% (74%). Of course, not all subjects discovered the optimal strategy during the experiment and so we would not expect to see definitive evidence for EL in those cases. Conditioning on the subjects whose choices did eventually converge to 0, we find that the K-S and model comparison tests both classify 94% of these subjects as epiphany learners. Here, conditional on being identified as an epiphany learner by either method, the probability that the same subject was also identified as an epiphany learner by the other method was 94% as well.
We included the commitment decision as an additional way to detect epiphanies. We reasoned that if a subject has an epiphany and discovers the optimal strategy she should be willing to commit to that strategy for a small cash bonus [one experimental currency unit (ECU) per trial, regardless of the outcome]. Although a vast majority of subjects who eventually converged to 0 in their choices did commit to 0 (42% overall, 76% of the 56% whose choices converged to 0), there were also many subjects who committed to other numbers (37% overall). The rest of the subjects never committed (20%) (Fig. 2A). Although we were not expecting subjects to commit to numbers above 0, these subjects turned out to be useful in our later analyses.
Commitment behavior. (A) Histogram of subject types based on their commitment and choice behavior. Here, the “Choices do not Converge” and “Choices Converge to 0” groups include only subjects who never committed. (B) Histogram of the numbers that subjects committed to. (C) Histogram of the trial in which subjects committed, conditional on commitment type.
Subjects who committed to numbers greater than 0 were surprisingly uniform in their commit numbers (
Conditioning on the commit-to-0 group (
As described above, what distinguishes EL from standard RL is a sudden shift in behavior to a steady strategy (Fig. 3). This behavior is typically overlooked when examining aggregate behavior because different subjects have epiphanies at different points in time and so averaging over them one is left with a picture of a gradual increase in optimal behavior. This phenomenon underscores the importance of examining individual-level behavior (39⇓–41).
Comparing EL and RL. Red and green lines are best-fit curves from the EL and RL model, respectively. (A) Aggregated data (
One might wonder whether a sudden choice of, and then commitment to, a previously unchosen alternative is sufficient to identify EL. In fact, this is not the case. Commit-to-0 subjects (the ones primarily identified as epiphany learners) were significantly increasing in their likelihood of choosing the commit number as they approached the commitment trial (
Understanding the EL Process.
After demonstrating the existence of epiphanies, it is natural to ask how they occur. In our paradigm, one complication comes from trying to distinguish processes associated with epiphanies from processes associated with deciding to commit. Thankfully, the commit
In the model, a subject aggregates evidence (
We next sought to test whether gaze patterns could help us anticipate subjects’ epiphanies. If, as suggested by the analysis above, the eye-tracking data reflect the subjective evidence favoring the optimal choice, we should be able to use that data to predict the timing of epiphanies. To do so we focused on the first three trials of the experiment and asked how often a subject refixated on an option at least once in each of those trials (zero to three times). We reasoned that trials with refixations are indicative of there being evidence favoring certain options, whereas trials without refixations are more likely associated with an exploratory search process (43, 44). We found that indeed the probability of refixating in these first three trials was correlated with the timing of commitment, for the commit-to-0 group (
A second important feature of the EL model is the sudden jump in both behavior and awareness of that option’s being the optimal strategy. The eye-tracking data corroborate this hypothesis. Looking at the commitment screen we observed that both types of subjects spent significantly more time looking at the “yes” button on the commit trial (relative to earlier trials,
Commitment screen results. Subjects’ relative dwell times on the yes and no buttons as they approached the commit trial. Bars indicate SEM, clustered by subject. The left-shifted bars belong to the commit
Importantly, the dwell time on the yes button, for the commit
The EL model also assumes that subjects learn by accumulating evidence for winning numbers and against losing numbers based on their own choices. This assumption suggests that during the feedback screen epiphany learners should preferentially attend to the game outcome rather than to their opponents’ decision, whereas nonepiphany learners might do the opposite, which was indeed the case (regressions 1 and 3 in SI Appendix, Table S6). Commit-to-0 subjects did not look significantly less or more at their opponents’ decision, either overall (
Finally, we tested two additional model hypotheses using the pupil dilation data. First, prior work has demonstrated a link between pupil dilation and prediction errors (or surprise) in learning tasks (29, 45, 46). If subjects are indeed learning, we might expect their pupil dilation to reflect prediction errors, as defined by the absolute difference between the subject’s choice and the target number. The commit-to-0 subjects’ pupil dilation during the feedback screen (before the commit trial) was positively correlated with this measure of prediction error (
Moreover, looking at the feedback screen in the trial immediately before commitment, we found that for the commit
Pupillary responses. Change in pupil diameter on the feedback screen for wins and losses immediately before the commit trial, for subjects who (Left) committed to 0 and (Right) committed to other numbers. Red horizontal bars indicate significant differences between wins and losses at
Discussion
In this paper we proposed and tested a model we call EL. Unlike RL, which is characterized by gradual behavioral shifts, EL features sudden and permanent shifts in behavior. Although sudden shifts in behavior have been previously documented, here we proposed a mechanism for such epiphanies and tested it using process data. In particular, we have proposed that epiphanies are the result of an evidence accumulation process that approaches an epiphany threshold as optimal choices are rewarded or suboptimal choices are not.
To test this model, we used a variant of a well-known task from behavioral game theory: the beauty contest. In particular, the two-person version of the beauty contest has a dominant strategy of choosing 0 [although a recent paper has argued that subjects may be confusing this game with an alternative “distance payoff” game, where 0 is not a dominant strategy (37)]. Although many of our subjects discovered this dominant strategy, nearly half of them did not, even after many repetitions of the same game (up to 30). Furthermore, nearly as many subjects chose to commit to suboptimal strategies (
The advantage of a process model is that it can be tested with process data. We used eye-tracking data to confirm various assumptions of the model, namely that epiphany learners increasingly focus on the optimal strategy during the decision stage, pay more attention to outcomes than to their opponents’ decisions during feedback, and exhibit signs of learning in their pupillary response to the game outcomes. Our results suggest several ways in which it may be possible to distinguish legitimate epiphanies from “false” epiphanies. It will be interesting in future work to explore applications of these findings. For instance, in economics there are “principal-agent problems” where the principal has to rely on an agent of unknown expertise (imagine a clueless subject in our experiment hiring someone to make the decision for him). By assumption the principal does not know the optimal strategy and so he cannot tell whether the agent is informed or not. Our findings indicate that the principal may be able to use patterns in the agent’s choices, eye movements, and pupil dilation to detect the true experts, similar to how an inability to make eye contact can be used to detect autism (47). These results may also be of interest in education, where it would be useful to be able to quickly detect when students have truly understood, as opposed to simply nodding their heads and saying that they have understood.
Of course, it remains to be seen how well the model will generalize to other settings. Although this is an empirical question that we cannot answer directly here, previous papers (9, 10) have demonstrated epiphany-like learning in the game of 21 and Nim. Thus, it is reasonable to believe that our model will be applicable in other contexts.
Moreover, it is also clear that not all learning can be classified as EL. It remains to be seen what makes different problems more or less amenable to EL, although one obvious criterion for EL is the presence of an optimal strategy.
Finally, although the RL model we have presented is a simple version of this type of model, there are certainly more sophisticated RL models in the literature (48⇓⇓–51). However, these RL models share a fundamental feature, which is that the predicted choice probabilities gradually change over time. Although learning rates can be adjusted to capture arbitrarily large jumps in behavior, those same learning rates incorrectly predict equally large shifts in behavior before the epiphany. Thus, RL models cannot seem to capture EL without additional assumptions.
Materials and Methods
Subjects.
Fifty-nine undergraduates from The Ohio State University participated in the experiment. Subjects gave informed written consent before receiving the experimental instructions. Subjects were paid a $5 show-up fee, in addition to receiving 40 cents (10 ECUs) for each win in the experiment. The Ohio State University’s Human Subjects Internal Review Board approved the experiment.
Task.
Choices for the database.
We recruited an initial group of subjects (
The eye-tracking experiment.
Subjects went through two training phases (no payment) before the main experiment. In all three cases, the choice screen (Fig. 1A) and the choice mechanism were the same. What varied across the three phases were the task, feedback, and whether there was a commitment screen after they made the choice.
The choice mechanism used a combination of eye movements and the keyboard. Subjects had as long as they wanted to make a choice. Once they were ready to make a choice, they had to first fixate on the center of the screen and press the space bar, then fixate on their choice (the corresponding circle diameter increased by 10%), and then press the space bar again within 2.5 s. We used this procedure to ensure that subjects’ eye movements during the choice process were separate from the process of physically selecting the chosen number.
In the first training phase, subjects received a random number from 0 to 10 and their task was to then select that number on the choice screen. This training phase ensured that subjects understood how to select their desired numbers. After five consecutive successes, subjects moved on to the second training phase.
In the second training phase, subjects played the N-person beauty contest (
In the main experiment, subjects played the 2BC game against the same database for 30 trials. In each trial, the opponent’s decision was a random subject’s decision from a random trial in the first 20 trials of the database. We opted to use this database method for practical reasons (only one eye tracker), to avoid any possibility that subjects might try to influence their opponents, and to reduce the likelihood that subjects might learn to play 0 from simply observing their opponents’ learning process. Starting in the second trial, after the choice screen, subjects were taken to the commitment screen (Fig. 1B), which contained the choice they had just made, “yes” and “no” buttons, and a written warning that a “yes” decision would mean that they would no longer be able to make decisions in the experiment. Subjects had as long as they wanted to make a choice on this screen. When ready, subjects had to fixate on the yes or no button (the corresponding button became 10% larger) and press the space bar to make their choice. In the trials following a commitment, the choice and commitment screens were replaced by a 10-s countdown screen.
At the end of every trial, subjects saw feedback for that trial. The feedback included the subject’s own decision, the opponent’s decision, the target number, and the game result (Fig. 1C). Subjects received 10 ECUs for a win, 0 for a loss, and in the case of a tie a digital coin-flip determined the result. Subjects also received 1 ECU each trial postcommitment, regardless of the game result.
RL Model.
As an alternative to our EL model, we also use an RL model from previous research (53). As in the EL model, we group choices into
At the beginning of the experiment (
Acknowledgments
This work was supported by NSF Career Grant 1554837 (to I.K.).
Footnotes
↵1W.J.C. and I.K. contributed equally to this work.
- ↵2To whom correspondence should be addressed. Email: krajbich.1{at}osu.edu.
Author contributions: W.J.C. and I.K. designed research; W.J.C. and I.K. performed research; W.J.C. contributed new reagents/analytic tools; W.J.C. analyzed data; and W.J.C. and I.K. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1618161114/-/DCSupplemental.
References
- ↵.
- Erev I,
- Roth AE
- ↵.
- Sutton RS,
- Barto AG
- ↵
- ↵.
- Wallas G
- ↵
- ↵
- ↵.
- Dodds R,
- Ward T,
- Smith S
- ↵
- ↵.
- Dufwenberg M,
- Sundaram R,
- Butler DJ
- ↵.
- McKinney CN,
- Van Huyck JB
- ↵
- ↵
- ↵.
- Camerer CF,
- Johnson EJ
- ↵
- ↵
- ↵.
- Krajbich I,
- Rangel A
- ↵.
- Ashby NJ,
- Dickert S,
- Glöckner A
- ↵
- ↵.
- Polonio L,
- Di Guida S,
- Coricelli G
- ↵.
- Stewart N,
- Gächter S,
- Noguchi T,
- Mullett TL
- ↵
- ↵
- ↵.
- Luce RD
- ↵
- ↵
- ↵
- ↵.
- Woodford M
- ↵.
- Krajbich I,
- Dean M
- ↵
- ↵.
- Glimcher PW,
- Fehr E
- ↵
- ↵.
- Towal RB,
- Mormann M,
- Koch C
- ↵
- ↵
- ↵.
- Konovalov A,
- Krajbich I
- ↵.
- Grosskopf B,
- Nagel R
- ↵.
- Nagel R,
- Bühren C,
- Frank B
- ↵
- ↵
- ↵.
- Glautier S
- ↵.
- Murre JM
- ↵
- ↵.
- Janiszewski C
- ↵.
- Pfeiffer J, et al.
- ↵
- ↵.
- O’Reilly JX, et al.
- ↵
- ↵.
- Börgers T,
- Sarin R
- ↵.
- Arifovic J,
- McKelvey RD,
- Pevnitskaya S
- ↵
- ↵
- ↵
- ↵