Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Latest Articles
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • Archive
  • Front Matter
  • News
    • For the Press
    • Highlights from Latest Articles
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Purpose and Scope
    • Editorial and Journal Policies
    • Submission Procedures
    • For Reviewers
    • Author FAQ

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Enhancing human learning via spaced repetition optimization

Behzad Tabibian, Utkarsh Upadhyay, Abir De, View ORCID ProfileAli Zarezade, Bernhard Schölkopf, and Manuel Gomez-Rodriguez
PNAS March 5, 2019 116 (10) 3988-3993; first published January 22, 2019 https://doi.org/10.1073/pnas.1815156116
Behzad Tabibian
aNetworks Learning Group, Max Planck Institute for Software Systems, 67663 Kaiserslautern, Germany;
bEmpirical Inference Department, Max Planck Institute for Intelligent Systems, 72076 Tübingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: me@btabibian.com
Utkarsh Upadhyay
aNetworks Learning Group, Max Planck Institute for Software Systems, 67663 Kaiserslautern, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abir De
aNetworks Learning Group, Max Planck Institute for Software Systems, 67663 Kaiserslautern, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ali Zarezade
aNetworks Learning Group, Max Planck Institute for Software Systems, 67663 Kaiserslautern, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ali Zarezade
Bernhard Schölkopf
bEmpirical Inference Department, Max Planck Institute for Intelligent Systems, 72076 Tübingen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manuel Gomez-Rodriguez
aNetworks Learning Group, Max Planck Institute for Software Systems, 67663 Kaiserslautern, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  1. Edited by Richard M. Shiffrin, Indiana University, Bloomington, IN, and approved December 14, 2018 (received for review September 3, 2018)

See related content:

  • Artificial intelligence to support human instruction
    - Mar 05, 2019
  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Understanding human memory has been a long-standing problem in various scientific disciplines. Early works focused on characterizing human memory using small-scale controlled experiments and these empirical studies later motivated the design of spaced repetition algorithms for efficient memorization. However, current spaced repetition algorithms are rule-based heuristics with hard-coded parameters, which do not leverage the automated fine-grained monitoring and greater degree of control offered by modern online learning platforms. In this work, we develop a computational framework to derive optimal spaced repetition algorithms, specially designed to adapt to the learners’ performance. A large-scale natural experiment using data from a popular language-learning online platform provides empirical evidence that the spaced repetition algorithms derived using our framework are significantly superior to alternatives.

Abstract

Spaced repetition is a technique for efficient memorization which uses repeated review of content following a schedule determined by a spaced repetition algorithm to improve long-term retention. However, current spaced repetition algorithms are simple rule-based heuristics with a few hard-coded parameters. Here, we introduce a flexible representation of spaced repetition using the framework of marked temporal point processes and then address the design of spaced repetition algorithms with provable guarantees as an optimal control problem for stochastic differential equations with jumps. For two well-known human memory models, we show that, if the learner aims to maximize recall probability of the content to be learned subject to a cost on the reviewing frequency, the optimal reviewing schedule is given by the recall probability itself. As a result, we can then develop a simple, scalable online spaced repetition algorithm, MEMORIZE, to determine the optimal reviewing times. We perform a large-scale natural experiment using data from Duolingo, a popular language-learning online platform, and show that learners who follow a reviewing schedule determined by our algorithm memorize more effectively than learners who follow alternative schedules determined by several heuristics.

  • memorization
  • spaced repetition
  • human learning
  • marked temporal point processes
  • stochastic optimal control

Our ability to remember a piece of information depends critically on the number of times we have reviewed it, the temporal distribution of the reviews, and the time elapsed since the last review, as first shown by a seminal study by Ebbinghaus (1). The effect of these two factors has been extensively investigated in the experimental psychology literature (2, 3), particularly in second language acquisition research (4⇓⇓–7). Moreover, these empirical studies have motivated the use of flashcards, small pieces of information a learner repeatedly reviews following a schedule determined by a spaced repetition algorithm (8), whose goal is to ensure that learners spend more (less) time working on forgotten (recalled) information.

The task of designing spaced repetition algorithms has a rich history, starting with the Leitner system (9). More recently, several works (10, 11) have proposed heuristic algorithms that schedule reviews just as the learner is about to forget an item, i.e., when the probability of recall, as given by a memory model of choice (1, 12), falls below a threshold. An orthogonal line of research (7, 13) has pursued locally optimal scheduling by identifying which item would benefit the most from a review given a fixed reviewing time. In doing so, the researchers have also proposed heuristic algorithms that decide which item to review by greedily selecting the item which is closest to its maximum learning rate.

In recent years, spaced repetition software and online platforms such as Mnemosyne (mnemosyne-proj.org), Synap (www.synap.ac), and Duolingo (www.duolingo.com) have become increasingly popular, often replacing the use of physical flashcards. The promise of these pieces of software and online platforms is that automated fine-grained monitoring and greater degree of control will result in more effective spaced repetition algorithms. However, most of the above spaced repetition algorithms are simple rule-based heuristics with a few hard-coded parameters (8), which are unable to fulfill this promise—adaptive, data-driven algorithms with provable guarantees have been largely missing until very recently (14, 15). Among these recent notable exceptions, the work most closely related to ours is by Reddy et al. (15), who proposed a queueing network model for a particular spaced repetition method—the Leitner system (9) for reviewing flashcards—and then developed a heuristic approximation algorithm for scheduling reviews. However, their heuristic does not have provable guarantees, it does not adapt to the learner’s performance over time, and it is specifically designed for the Leitner systems.

In this work, we develop a computational framework to derive optimal spaced repetition algorithms, specially designed to adapt to the learner’s performance, as continuously monitored by modern spaced repetition software and online learning platforms. More specifically, we first introduce a flexible representation of spaced repetition using the framework of marked temporal point processes (16). For several well-known human memory models (1, 12, 17⇓–19), we use this presentation to express the dynamics of a learner’s forgetting rates and recall probabilities for the content to be learned by means of a set of stochastic differential equations (SDEs) with jumps. Then, we can find the optimal reviewing schedule for spaced repetition by solving a stochastic optimal control problem for SDEs with jumps (20⇓⇓–23). In doing so, we need to introduce a proof technique to find a solution to the so-called Hamilton–Jacobi–Bellman (HJB) equation (SI Appendix, sections 3 and 4), which is of independent interest.

For two well-known memory models, we show that, if the learner aims to maximize recall probability of the content to be learned subject to a cost on the reviewing frequency, the solution uncovers a linear relationship with a negative slope between the optimal rate of reviewing, or reviewing intensity, and the recall probability of the content to be learned. As a consequence, we can develop a simple, scalable online spaced repetition algorithm, which we name MEMORIZE, to determine the optimal reviewing times. Finally, we perform a large-scale natural experiment using data from Duolingo, a popular language-learning online platform, and show that learners who follow a reviewing schedule determined by our algorithm memorize more effectively than learners who follow alternative schedules determined by several heuristics. To facilitate research in this area, we are releasing an open-source implementation of our algorithm (24).

Modeling Framework of Spaced Repetition.

Our framework is agnostic to the particular choice of memory model—it provides a set of techniques to find reviewing schedules that are optimal under a memory model. Here, for ease of exposition, we showcase our framework for one well-known memory model from the psychology literature, the exponential forgetting curve model with binary recalls (1, 17), and use (a variant of) a recent machine-learning method, half-life regression (25), to estimate the effect of the reviews on the parameters of such model. [In SI Appendix, sections 6 and 7, we apply our framework to other two popular memory models, the power-law forgetting curve model (18, 19) and the multiscale context model (MCM) (12).]

More specifically, given a learner who wants to memorize a set of items I using spaced repetition, i.e., repeated, spaced review of the items, we represent each reviewing event as a triplete ≔ (iitem↑, ttime↓, rrecall↑),which means that the learner reviewed item i∈I at time t and either recalled it (r=1) or forgot it (r=0). Here, note that each reviewing event includes the outcome of a test (i.e., a recall) since, in most spaced repetition software and online platforms such as Mnemosyne, Synap, and Duolingo, the learner is tested in each review, following the seminal work of Reidiger and Karpicke (26).

Given the above representation, we model the probability that the learner recalls (forgets) item i at time t using the exponential forgetting curve model; i.e.,mi(t)≔P(r)=exp−ni(t)(t−tr),[1]where tr is the time of the last review and ni(t)∈R+ is the forgetting rate at time t, which may depend on many factors, e.g., item difficulty and/or number of previous (un)successful recalls of the item. [Previous work often uses the inverse of the forgetting rate, referred to as memory strength or half-life, s(t)=n−1(t) (15, 25). However, it is more tractable for us to work in terms of forgetting rates.] Then, we keep track of the reviewing times using a multidimensional counting process N(t), in which the ith entry, Ni(t), counts the number of times the learner has reviewed item i up to time t. Following the literature on temporal point processes (16), we characterize these counting processes using their corresponding intensities u(t), i.e., E[dN(t)]=u(t)dt, and think of the recall r as their binary marks. Moreover, every time that a learner reviews an item, the recall r has been experimentally shown to have an effect on the forgetting rate of the item (3, 15, 25). Here, we estimate such an effect using half-life regression (25), which implicitly assumes that recalls of an item i during a review have a multiplicative effect on the forgetting rate ni(t)—a successful recall at time tr changes the forgetting rate by (1−αi), i.e., ni(t)=(1−αi)ni(tr), αi≤1, while an unsuccessful recall changes the forgetting rate by (1+βi), i.e., ni(t)=(1+βi)ni(tr), βi≥0. In this context, the initial forgetting rate, ni(0), captures the difficulty of the item, with more difficult items having higher initial forgetting rates compared with easier items, and the parameters αi, βi, and ni(0) are estimated using real data (refer to SI Appendix, section 8 for more details).

Before we proceed farther, we acknowledge that several laboratory studies (6, 27) have provided empirical evidence that the retention rate follows an inverted U shape, i.e., mass practice does not improve the forgetting rate—if an item is in a learner’s short-term memory when the review happens, the long-term retention does not improve. Thus, one could argue for time-varying parameters αi(t) and βi(t) in our framework. However, there are several reasons that prevent us from that: (i) The derivation of an optimal reviewing schedule under time-varying parameters becomes very challenging; (ii) for the reviewing sequences in our Duolingo dataset, allowing for time-varying αi and βi in our modeling framework does not lead to more accurate recall predictions (SI Appendix, section 9); and (iii) several popular spaced repetition heuristics, such as the Leitner system with exponential spacing and SuperMemo, have achieved reasonable success in practice despite implicitly assuming constant αi and βi. [The Leitner system with exponential spacing can be explicitly cast using our formulation with particular choices of αi and βi and the same initial forgetting rate, ni(0)=n(0), for all items (SI Appendix, section 11).] That being said, it would be an interesting venue for future work to derive optimal reviewing schedules under time-varying parameters.

Next, we express the dynamics of the forgetting rate ni(t) and the recall probability mi(t) for each item i∈I using SDEs with jumps. This is very useful for the design of our spaced repetition algorithm using stochastic optimal control. More specifically, the dynamics of the forgetting rate ni(t) are readily given bydni(t)=−αini(t)ri(t)dNi(t)+βini(t)(1−ri(t))dNi(t),[2]where Ni(t) is the corresponding counting process and ri(t)∈{0,1} indicates whether item i has been successfully recalled at time t. Similarly, the dynamics of the recall probability mi(t) are given by Proposition 1 (proved in SI Appendix, section 1):

Proposition 1.

Given an item i∈I with reviewing intensity ui(t), the recall probability mi(t), defined by Eq. 1, is a Markov process whose dynamics can be defined by the following SDE with jumps,dmi(t)=−ni(t)mi(t)dt+(1−mi(t))dNi(t),[3]where Ni(t) is the counting process associated to the reviewing intensity ui(t)†.

Finally, given a set of items I, we cast the design of a spaced repetition algorithm as the search of the optimal item reviewing intensities u(t)=[ui(t)]i∈I that minimize the expected value of a particular (convex) loss function ℓ(m(t),n(t),u(t)) of the recall probability of the items, m(t)=[mi(t)]i∈I; the forgetting rates, n(t)=[ni(t)]i∈I; and the intensities themselves, u(t); over a time window (t0,tf]; i.e.,minimizeu(t0,tf] Eϕ(m(tf),n(tf))+∫t0tfℓ(m(τ),n(τ),u(τ))dτsubject to u(t)≥0∀t∈(t0,tf),[4]where u(t0,tf] denotes the item reviewing intensities from t0 to tf, the expectation is taken over all possible realizations of the associated counting processes and (item) recalls, the loss function is nonincreasing (nondecreasing) with respect to the recall probabilities (forgetting rates and intensities) so that it rewards long-lasting learning while limiting the number of item reviews, and ϕ(m(tf),n(tf)) is an arbitrary penalty function. [The penalty function ϕ(m(tf),n(tf)) is necessary to derive the optimal reviewing intensities u*(t).] Here, note that the forgetting rates n(t) and recall probabilities m(t), as defined by Eqs. 2 and 3, depend on the reviewing intensities u(t) we aim to optimize since E[dN(t)]=u(t)dt.

The MEMORIZE Algorithm.

The spaced repetition problem, as defined by Eq. 4, can be tackled from the perspective of stochastic optimal control of jump SDEs (20). Here, we first derive a solution to the problem considering only one item, provide an efficient practical implementation of the solution, and then generalize it to the case of multiple items.

Given an item i, we can write the spaced repetition problem, i.e., Eq. 4, for it with reviewing intensity ui(t)=u(t) and associated counting process Ni(t)=N(t), recall outcome ri(t)=r(t), recall probability mi(t)=m(t), and forgetting rate ni(t)=n(t). Further, using Eqs. 2 and 3, we can define the forgetting rate n(t) and recall probability m(t) by the following two coupled jump SDEs,dn(t)=−αn(t)r(t)dN(t)+βn(t)(1−r(t))dN(t)dm(t)=−n(t)m(t)dt+(1−m(t))dN(t)with initial conditions n(t0)=n0 and m(t0)=m0.

Next, we define an optimal cost-to-go function J for the above problem, use Bellman’s principle of optimality to derive the corresponding HJB equation (28), and exploit the unique structure of the HJB equation to find the optimal solution to the problem.

Definition 2:

The optimal cost-to-go J(m(t),n(t),t) is defined as the minimum of the expected value of the cost of going from state (m(t),n(t)) at time t to the final state at time tf:J=minu(t,tf]E(N(s),r(s))s=ts=tfϕ(m(tf),n(tf))+∫ttfℓ(m(τ),u(τ))dτ.[5]Now, we use Bellman’s principle of optimality, which the above definition allows, to break the problem into smaller subproblems. [Bellman’s principle of optimality readily follows using the Markov property of the recall probability m(t) and forgetting rate n(t).] With dJ(m(t),n(t),t)=J(m(t+dt),n(t+dt),t+dt)−J(m(t),n(t),t), we can, hence, rewrite Eq. 5 asJ(m(t),n(t),t)=minu(t,t+dt]E[J(m(t+dt),n(t+dt),t+dt)]+ℓ(m(t),n(t),u(t))dt0=minu(t,t+dt]E[dJ(m(t),n(t),t)]+ℓ(m(t),n(t),u(t))dt.[6]

Then, to derive the HJB equation, we differentiate J with respect to time t, m(t), and n(t) using SI Appendix, section 2, Lemma 1:0=Jt(m,n,t)−nmJm(m,n,t) +minut,t+dtℓ(m,n,u)J(1,(1−α)n,t)m +J(1,(1+β)n,t)(1−m)−J(m,n,t)u(t).[7]To solve the above equation, we need to define the loss ℓ. Following the literature on stochastic optimal control (28), we consider the following quadratic form, which is nonincreasing (nondecreasing) with respect to the recall probabilities (intensities) so that it rewards learning while limiting the number of item reviews,ℓ(m(t),n(t),u(t))=12(1−m(t))2+12qu2(t),[8]where q is a given parameter, which trades off recall probability and number of item reviews—the higher its value, the lower the number of reviews. Note that this particular choice of loss function does not directly place a hard constraint on number of reviews; instead, it limits the number of reviews by penalizing high reviewing intensities. (Given a desired level of practice, the value of the parameter q can be easily found by simulation since the average number of reviews decreases monotonically with respect to q.)

Under these definitions, we can find the relationship between the optimal intensity and the optimal cost by taking the derivative with respect to u(t) in Eq. 7:u*(t)=q−1[J(m(t),n(t),t)−J(1,(1−α)n(t),t)m(t)−J(1,(1+β)n(t),t)(1−m(t))]+.Finally, we plug the above equation into Eq. 7 and find that the optimal cost-to-go J needs to satisfy the following nonlinear differential equation:0=Jt(m(t),n(t),t)−n(t)m(t)Jm(m(t),n(t),t)+12(1−m(t))2−12q−1J(m(t),n(t),t)−J1,(1−α)n(t),tm(t)−J(1,(1+β)n(t),t)(1−m(t))+2.

To continue farther, we rely on a technical lemma (SI Appendix, section 3, Lemma 2), which derives the optimal cost-to-go J for a parameterized family of losses ℓ. Using SI Appendix, section 3, Lemma 2, the optimal reviewing intensity is readily given by Theorem 3 (proved in SI Appendix, section 4):

Theorem 3.

Given a single item, the optimal reviewing intensity for the spaced repetition problem, defined by Eq. 4, under quadratic loss, defined by Eq. 8, is given byu*(t)=q−1/2(1−m(t)).[9]Note that the optimal intensity depends only on the recall probability, whose dynamics are given by Eqs. 2 and 3, and thus allows for a very efficient procedure to sample reviewing times, which we name MEMORIZE. Algorithm 1 provides a pseudocode implementation of MEMORIZE. Within the algorithm, Sample(u(t)) samples from an inhomogeneous Poisson process with intensity u(t) and it returns the sampled time and ReviewItem(t′) returns the recall outcome r of an item reviewed at time t′, where r=1 indicates the item was recalled successfully and r=0 indicates it was not recalled. Moreover, note that t denotes a (time) parameter, s and t′ denote specific (time) values, and we sample from an inhomogeneous Poisson process using a standard thinning algorithm (29). [In some practical deployments, one may want to discretize the optimal intensity u(t) and, e.g., “at top of each hour, decide whether to do a review or not.”]

View this table:
  • View inline
  • View popup

Given a set of multiple items I with reviewing intensities u(t) and associated counting processes N(t), recall outcomes r(t), recall probabilities m(t), and forgetting rates n(t), we can solve the spaced repetition problem defined by Eq. 4 similarly as in the case of a single item. More specifically, consider the following quadratic form for the loss ℓ,ℓ(m(t),n(t),u(t))=12∑i∈I(1−mi(t))2+12∑i∈Iqiui2(t),where {qi}i∈I are given parameters, which trade off recall probability and number of item reviews and may favor the learning of one item over another. Then, one can exploit the independence among items assumption to derive the optimal reviewing intensity for each item, proceeding similarly as in the case of a single item:

Theorem 4.

Given a set of items I, the optimal reviewing intensity for each item i∈I in the spaced repetition problem, defined by Eq. 4, under quadratic loss is given byui*(t)=qi−1/2(1−mi(t)).[10]Finally, note that we can easily sample item reviewing times simply by running |I| instances of MEMORIZE, one per item.

Experimental Design.

We use data gathered from Duolingo, a popular language-learning online platform (the dataset is available at https://github.com/duolingo/halflife-regression), to validate our algorithm MEMORIZE. (Refer to SI Appendix, section 5 for an experimental validation of our algorithm using synthetic data, whose goal is analyzing it under a controlled setting using metrics and baselines that we cannot compute in the real data we have access to.) This dataset consists of ∼12 million sessions of study, involving ∼5.3 million unique (user, word) pairs, which we denote by D, collected over the period of 2 wk. In a single session, a user answers multiple questions, each of which contains multiple words. (Refer to SI Appendix, section 12 for additional details on the Duolingo dataset.) Each word maps to an item i and the fraction of correct recalls of sentences containing a word i in the session is used as an empirical estimate of its recall probability m^(t) at the time of the session t, as in previous work (25). If a word is recalled perfectly during a session, then it is considered a successful recall, i.e., ri(t)=1, and otherwise it is considered an unsuccessful recall, i.e., ri(t)=0. Since we can expect the estimation of the model parameters to be accurate only for users and items with enough numbers of reviewing events, we consider only users with at least 30 reviewing events and words that were reviewed at least 30 times. After this preprocessing step, our dataset consists of ∼5.2 million unique (user, word) pairs.

We compare the performance of our method with two baselines: (i) a uniform reviewing schedule, which sends item(s) for review at a constant rate μ, and (ii) a threshold-based reviewing schedule, which increases the reviewing intensity of an item by c⁡exp(t−s)/ζ at time s, when its recall probability reaches a threshold mth. The threshold baseline is similar to the heuristics proposed by previous work (10, 11, 30), which schedule reviews just as the learner is about to forget an item. We do not compare with the algorithm proposed by Reddy et al. (15) because, as it is specially designed for the Leitner system, it assumes a discrete set of forgetting rate values and, as a consequence, is not applicable to our (more general) setting.

Although we cannot make actual interventions to evaluate the performance of each method, the following insight allows for a large-scale natural experiment: Duolingo uses hand-tuned spaced repetition algorithms, which propose reviewing times to the users; however, users often do not perform reviews exactly at the recommended times, and thus schedules for some (user, item) pairs will be closer to uniform than threshold or MEMORIZE and vice versa, as shown in Fig. 1. As a consequence, we are able to assign each (user, item) pair to a treatment group (i.e., MEMORIZE) or a control group (i.e., uniform or threshold). More in detail, we leverage this insight to design a robust evaluation procedure which relies on (i) likelihood comparisons to determine how closely a user followed a particular reviewing schedule during all reviews but the last in a reviewing sequence, i.e., e1,…,en−1 in a sequence with n reviews, and (ii) a quality metric, empirical forgetting rate n^, which can be estimated using only the last review en (and the retention interval tn−tn−1) of each reviewing sequence and does not depend on the particular choice of memory model. Refer to Materials and Methods for more details on our evaluation procedure. [Note that our goal is to evaluate how well different reviewing schedule spaces the reviews—our objective is not to evaluate the predictive power of the underlying memory models; we are relying on previous work for that (18, 25). However, for completeness, we provide a series of benchmarks and evaluations for the memory models we used in this work in SI Appendix, section 8.]

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Examples of (user, item) pairs whose corresponding reviewing times have high likelihood under MEMORIZE (Top), threshold-based reviewing schedule (Middle), and uniform reviewing schedule (Bottom). In every plot, each candlestick corresponds to a reviewing event with a green circle (red cross) if the recall was successful (unsuccessful), and time t=0 corresponds to the first time the user is exposed to the item in our dataset, which may or may not correspond with the first reviewing event. The pairs whose reviewing times follow more closely MEMORIZE or the threshold-based schedule tend to increase the time interval between reviews every time a recall is successful while, in contrast, the uniform reviewing schedule does not. MEMORIZE tends to space the reviews more than the threshold-based schedule, achieving the same recall pattern with less effort.

Results

We first group (user, item) pairs by their number of reviews n and their training period, i.e., tn−1−t1. Then, for each recall pattern, we create the treatment (MEMORIZE) and control (uniform and threshold) groups and, for every reviewing sequence in each group, compute its empirical forgetting rate. Fig. 2 summarizes the results for sequences with up to seven reviews since the beginning of the observation window for three distinct training periods. The results show that MEMORIZE offers a competitive advantage with respect to the uniform- and threshold-based baselines and, as the training period increases, the number of reviews under which MEMORIZE achieves the greatest competitive advantage increases. Here, we can rule out that this advantage is a consequence of selection bias due to the item difficulty (SI Appendix, section 13).

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

(A–C) Average empirical forgetting rate for the top 25% of pairs in terms of likelihood for MEMORIZE, the uniform reviewing schedule, and the threshold-based reviewing schedule for sequences with different numbers of reviews n and different training periods T=tn−1−t1. Boxes indicate 25% and 75% quantiles and solid lines indicate median values, where lower values indicate better performance. MEMORIZE offers a competitive advantage with respect to the uniform and the threshold-based baselines and, as the training period increases, the number of reviews under which MEMORIZE achieves the greatest competitive advantage increases. For each distinct number of reviews and training periods, * indicates a statistically significant difference (Mann–Whitney U test; P-value < 0.05) between MEMORIZE vs. threshold and MEMORIZE vs. uniform scheduling.

Next, we go a step farther and verify that, whenever a specific learner follows MEMORIZE more closely, her performance is superior. More specifically, for each learner with at least 70 reviewing sequences with a training period T=8±3.2 d, we select the top and bottom 50% of reviewing sequences in terms of log-likelihood under MEMORIZE and compute the Pearson correlation coefficient between the empirical forgetting rate and log-likelihood values. Fig. 3 summarizes the results, which show that users, on average, achieve lower empirical forgetting rates whenever they follow MEMORIZE more closely.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Pearson correlation coefficient between the log-likelihood of the top and bottom 50% of reviewing sequences of a learner under MEMORIZE and its associated empirical forgetting rate. The circles indicate median values and the bars indicate standard error. Lower correlation values correspond to greater gains due to MEMORIZE. To ensure reliable estimation, we considered learners with at least 70 reviewing sequences with a training period T=8±3.2 d. There were 322 of such learners.

Since the Leitner system (9), there have been a wealth of spaced repetition algorithms (7, 8, 10, 11, 13). However, there has been a paucity of work on designing adaptive data-driven spaced repetition algorithms with provable guarantees. In this work, we have introduced a principled modeling framework to design online spaced repetition algorithms with provable guarantees, which are specially designed to adapt to the learners’ performance, as monitored by modern spaced repetition software and online platforms. Our modeling framework represents spaced repetition using the framework of marked temporal point processes and SDEs with jumps and, exploiting this representation, it casts the design of spaced repetition algorithms as a stochastic optimal control problem of such jump SDEs. Since our framework is agnostic to the particular modeling choices, i.e., the memory model and the quadratic loss function, we believe it provides a powerful tool to find spaced repetition algorithms that are provably optimal under a given choice of memory model and loss.

There are many interesting directions for future work. For example, it would be interesting to perform large-scale interventional experiments to assess the performance of our algorithm in comparison with existing spaced repetition algorithms deployed by, e.g., Duolingo. Moreover, in our work, we consider a particular quadratic loss and soft constraints on the number of reviewing events; however, it would be useful to derive optimal reviewing intensities for other losses capturing particular learning goals and hard constraints on the number of events. We assumed that, by reviewing an item, one can influence only its recall probability and forgetting rate. However, items may be dependent and, by reviewing an item, one can influence the recall probabilities and forgetting rates of several items. The dataset we used spans only 2 wk and that places a limitation on the range of time intervals between reviews and retention intervals we can study. It would be very interesting to evaluate our framework in datasets spanning longer periods of time. Finally, we believe that the mathematical techniques underpinning our algorithm, i.e., stochastic optimal control of SDEs with jumps, have the potential to drive the design of control algorithms in a wide range of applications.

Materials and Methods

Evaluation Procedure.

To evaluate performance of our proposed algorithm, we rely on the following evaluation procedure. For each (user, item) reviewing sequence, we first perform a likelihood-based comparison and determine how closely it follows a specific reviewing schedule (be it MEMORIZE, uniform, or threshold) during the first n−1 reviews, the training reviews, where n is the number of reviews in the reviewing sequence. Second, we compute a quality metric, empirical forgetting rate n^(tn), using the last review, the nth review or test review, and the retention interval tn−tn−1. Third, for each reviewing sequence, we record the value of the quality metric, the training period (i.e., T=tn−1−t1), and the likelihood under each reviewing schedule. Finally, we control for the training period and the number of reviewing events and create the treatment and control groups by picking the top 25% of pairs in terms of likelihood for each method, where we skip any sequence lying in the top 25% for more than one method. Refer to SI Appendix, section 13 for an additional analysis showing that our evaluation procedure satisfies the random assignment assumption for the item difficulties between treatment and control groups (31).

In the above procedure, to do the likelihood-based comparison, we first estimate the parameters α and β and the initial forgetting rate ni(0) using half-life regression on the Duolingo dataset. Here, note that we fit a single set of parameters α and β for all items and a different initial forgetting rate ni(0) per item i, and we use the power-law forgetting curve model due to its better performance (in terms of MAE) in our experiments (refer to SI Appendix, section 8 for more details). Then, for each user, we use maximum-likelihood estimation to fit the parameter q in MEMORIZE and the parameter μ in the uniform reviewing schedule. For the threshold-based schedule, we fit one set of parameters c and ζ for each sequence of review events, using maximum-likelihood estimation for the parameter c and grid search for the parameter ζ, and we fit one parameter mth for each user using grid search. Finally, we compute the likelihood of the times of the n−1 reviewing events for each (user, item) pair under the intensity given by MEMORIZE, i.e., u(t)=q−1/2(1−m(t)); the intensity given by the uniform schedule, i.e., u(t)=μ; and the intensity given by the threshold-based schedule, i.e., u(t)=c⁡exp((t−s)/ζ). The likelihood LL(ti) of a set of reviewing events ti given an intensity function u(t) can be computed as follows (16):LL({ti})=∑ilog⁡u(ti)−∫0Tu(t) dt.More details on the empirical distribution of likelihood values under each reviewing schedule are provided in SI Appendix, section 10.

Quality Metric: Empirical Forgetting Rate.

For each (user, item), the empirical forgetting rate is an empirical estimate of the forgetting rate by the time tn of the last reviewing event; i.e.,n^=−log(m^(tn))/(tn−tn−1),where m^(tn) is the empirical recall probability, which consists of the fraction of correct recalls of sentences containing word (item) i in the session at time tn. Note that this empirical estimate does not depend on the particular choice of memory model and, given a sequence of reviews, the lower the empirical forgetting rate is, the more effective the reviewing schedule.

Moreover, for a more fair comparison across items, we normalize each empirical forgetting rate using the average empirical initial forgetting rate of the corresponding item at the beginning of the observation window n^0; i.e., for an item i,n^0=1|Di|∑(u,i)∈Din^0,(u,i),where Di⊆D is the subset of (user, item) pairs in which item i was reviewed. Furthermore, n^0,(u,i)=−log(m^(t(u,i),1))/(t(u,i),1−t(u,i),0), where t(u,i),k is the kth review in the reviewing sequence associated to the (u,i) pair. However, our results are not sensitive to this normalization step, as shown in SI Appendix, section 14.

Footnotes

  • ↵1To whom correspondence should be addressed. Email: me{at}btabibian.com.
  • Author contributions: B.T., U.U., B.S., and M.G.-R. designed research; B.T., U.U., A.D., A.Z., and M.G.-R. performed research; B.T. and U.U. analyzed data; and B.T., U.U., and M.G.-R. wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission.

  • Data deposition: The MEMORIZE algorithm has been deposited on GitHub (https://github.com/Networks-Learning/memorize).

  • ↵†To derive Eq. 3, we assume that the recall probability mi(t) is set to 1 every time item i is reviewed. Here, one may also account for item difficulty by considering that, for more difficult items, the recall probability is set to oi∈[0,1) every time item i is reviewed.

  • See Commentary on page 3953.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1815156116/-/DCSupplemental.

  • Copyright © 2019 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

References

  1. ↵
    Ebbinghaus H (1885); trans Ruger HA, Bussenius CE (1913) [Memory: A Contribution to Experimental Psychology] (Columbia Univ Teachers College, New York).
  2. ↵
    1. Melton AW
    (1970) The situation with respect to the spacing of repetitions and memory. J Verbal Learn Verbal Behav 9:596–606.
    OpenUrlCrossRef
  3. ↵
    1. Dempster FN
    (1989) Spacing effects and their implications for theory and practice. Educ Psychol Rev 1:309–330.
    OpenUrlCrossRef
  4. ↵
    1. Atkinson RC
    (1972) Optimizing the learning of a second-language vocabulary. J Exp Psychol 96:124–129.
    OpenUrlCrossRef
  5. ↵
    1. Bloom KC,
    2. Shuell TJ
    (1981) Effects of massed and distributed practice on the learning and retention of second-language vocabulary. J Educ Res 74:245–248.
    OpenUrlCrossRef
  6. ↵
    1. Cepeda NJ,
    2. Pashler H,
    3. Vul E,
    4. Wixted JT,
    5. Rohrer D
    (2006) Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychol Bull 132:354–380.
    OpenUrlCrossRefPubMed
  7. ↵
    1. Pavlik PI,
    2. Anderson JR
    (2008) Using a model to compute the optimal schedule of practice. J Exp Psychol Appl 14:101–117.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Branwen G
    (2016) Spaced repetition. Available at https://www.gwern.net/Spaced%20repetition. Accessed October 20, 2018.
  9. ↵
    1. Leitner S
    (1974) So Lernt Man Lernen (Herder, Freiburg, Germany).
  10. ↵
    1. Metzler-Baddeley C,
    2. Baddeley RJ
    (2009) Does adaptive training work? Appl Cogn Psychol 23:254–266.
    OpenUrlCrossRef
  11. ↵
    1. Lindsey RV,
    2. Shroyer JD,
    3. Pashler H,
    4. Mozer MC
    (2014) Improving students’ long-term knowledge retention through personalized review. Psychol Sci 25:639–647.
    OpenUrlCrossRefPubMed
  12. ↵
    1. Pashler H,
    2. Cepeda N,
    3. Lindsey RV,
    4. Vul E,
    5. Mozer MC
    (2009) Predicting the optimal spacing of study: A multiscale context model of memory. Advances in Neural Information Processing Systems, eds Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A (Neural Information Processing Systems Foundation, New York), pp 1321–1329.
  13. ↵
    1. Mettler E,
    2. Massey CM,
    3. Kellman PJ
    (2016) A comparison of adaptive and fixed schedules of practice. J Exp Psychol Gen 145:897–917.
    OpenUrl
  14. ↵
    1. Novikoff TP,
    2. Kleinberg JM,
    3. Strogatz SH
    (2012) Education of a model student. Proc Natl Acad Sci USA 109:1868–1873.
    OpenUrlAbstract/FREE Full Text
  15. ↵
    1. Reddy S,
    2. Labutov I,
    3. Banerjee S,
    4. Joachims T
    (2016) Unbounded human learning: Optimal scheduling for spaced repetition. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, eds Smola A, Aggrawal C, Shen D, Rastogi R (ACM, San Francisco), pp 1815–1824.
  16. ↵
    1. Aalen O,
    2. Borgan O,
    3. Gjessing HK
    (2008) Survival and Event History Analysis: A Process Point of View (Springer, New York).
  17. ↵
    1. Loftus GR
    (1985) Evaluating forgetting curves. J Exp Psychol Learn Mem Cogn 11:397–406.
    OpenUrlCrossRef
  18. ↵
    1. Wixted JT,
    2. Carpenter SK
    (2007) The Wickelgren power law and the Ebbinghaus savings function. Psychol Sci 18:133–134.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Averell L,
    2. Heathcote A
    (2011) The form of the forgetting curve and the fate of memories. J Math Psychol 55:25–35.
    OpenUrl
  20. ↵
    1. Hanson FB
    (2007) Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis, and Computation (SIAM, Philadelphia).
  21. ↵
    1. Zarezade A,
    2. De A,
    3. Upadhyay U,
    4. Rabiee HR,
    5. Gomez-Rodriguez M
    (2018) Steering social activity: A stochastic optimal control point of view. J Mach Learn Res 18:205.
    OpenUrl
  22. ↵
    1. Zarezade A,
    2. Upadhyay U,
    3. Rabiee H,
    4. Gomez-Rodriguez M
    (2017) Redqueen. An online algorithm for smart broadcasting in social networks. Proceedings of the 10th ACM International Conference on Web Search and Data Mining, eds Tomkins A, Zhang M (ACM, New York), pp 51–60.
  23. ↵
    1. Kim J,
    2. Tabibian B,
    3. Oh A,
    4. Schoelkopf B,
    5. Gomez-Rodriguez M
    (2018) Leveraging the crowd to detect and reduce the spread of fake news and misinformation. Proceedings of the 11th ACM International Conference on Web Search and Data Mining, eds Liu Y, Maarek Y (ACM, New York), pp 324–332.
  24. ↵
    1. Tabibian B, et al.
    (2019) Data from “Enhancing human learning via spaced repetition optimization.” Github. Available at https://github.com/Networks-Learning/memorize. Deposited January 8, 2019.
  25. ↵
    1. Settles B,
    2. Meeder B
    (2016) A Trainable Spaced Repetition Model for Language and Learning (ACL, Berlin).
  26. ↵
    1. Roediger HL III,
    2. Karpicke JD
    (2006) Test-enhanced learning: Taking memory tests improves long-term retention. Psychol Sci 17:249–255.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Cepeda NJ,
    2. Vul E,
    3. Rohrer D,
    4. Wixted JT,
    5. Pashler H
    (2008) Spacing effects in learning: A temporal ridgeline of optimal retention. Psychol Sci 19:1095–1102.
    OpenUrlCrossRefPubMed
  28. ↵
    1. Bertsekas DP
    (1995) Dynamic Programming and Optimal Control (Athena Scientific, MA).
  29. ↵
    1. Lewis PA,
    2. Shedler GS
    (1979) Simulation of nonhomogeneous Poisson processes by thinning. Naval Res Logistics Q 26:403–413.
    OpenUrlCrossRef
  30. ↵
    1. Bjork EL,
    2. Bjork R
    (2011) Making things hard on yourself, but in a good way. Psychology in the Real World, eds Gernsbacher M, Pomerantz J (Worth Publishers, New York), pp 56–64.
  31. ↵
    1. Stuart EA
    (2010) Matching methods for causal inference: A review and a look forward. Stat Sci Rev J Inst Math Stat 25:1–21.
    OpenUrl
View Abstract
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Enhancing human learning via spaced repetition optimization
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
Citation Tools
Enhancing human learning via spaced repetition optimization
Behzad Tabibian, Utkarsh Upadhyay, Abir De, Ali Zarezade, Bernhard Schölkopf, Manuel Gomez-Rodriguez
Proceedings of the National Academy of Sciences Mar 2019, 116 (10) 3988-3993; DOI: 10.1073/pnas.1815156116

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Enhancing human learning via spaced repetition optimization
Behzad Tabibian, Utkarsh Upadhyay, Abir De, Ali Zarezade, Bernhard Schölkopf, Manuel Gomez-Rodriguez
Proceedings of the National Academy of Sciences Mar 2019, 116 (10) 3988-3993; DOI: 10.1073/pnas.1815156116
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 116 (10)
Table of Contents

Submit

Sign up for Article Alerts

Article Classifications

  • Physical Sciences
  • Applied Mathematics

Jump to section

  • Article
    • Abstract
    • Results
    • Materials and Methods
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

 Coral reef bleaching frequently makes headlines, but researchers are still trying to sort out the cellular mechanisms at work.  Image credit: The Ocean Agency/XL Catlin Seaview Survey.
Inner Workings: A microscopic mystery at the heart of mass-coral bleaching
Coral reef bleaching frequently makes headlines, but researchers are still trying to sort out the cellular mechanisms at work.
Image credit: The Ocean Agency/XL Catlin Seaview Survey.
A deep-learning algorithm could potentially improve diagnosis and classification of neurological abnormalities. Image courtesy of Weicheng Kuo, Christian Hӓne, Pratik Mukherjee, Jitendra Malik, and Esther Lim Yuh
Brain hemorrhage detection by artificial neural network
A deep-learning algorithm could potentially improve diagnosis and classification of neurological abnormalities.
Image courtesy of Weicheng Kuo, Christian Hӓne, Pratik Mukherjee, Jitendra Malik, and Esther L. Yuh.
A study finds a shift in onset of El Niño events from eastern to western Pacific and increased frequency of extreme El Niño events since the late 1970s. Image courtesy of NOAA National Environmental Satellite, Data, and Information Service (NESDIS).
Changing El Niño properties
A study finds a shift in onset of El Niño events from eastern to western Pacific and increased frequency of extreme El Niño events since the late 1970s.
Image courtesy of NOAA National Environmental Satellite, Data, and Information Service (NESDIS).
A study explores how various types of food affect both human health and the environment. Image courtesy of Pixabay/esigie.
Environmental and health impacts of food
A study explores how various types of food affect both human health and the environment.
Image courtesy of Pixabay/esigie.
Profile of NAS member and molecular biologist Mary Lou Guerinot. Image courtesy of Olga Zhaxybayeva (Dartmouth College, Hanover, NH).
Featured Profile
Profile of NAS member and molecular biologist Mary Lou Guerinot
Image courtesy of Olga Zhaxybayeva (Dartmouth College, Hanover, NH).

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Latest Articles
  • Archive

PNAS Portals

  • Classics
  • Front Matter
  • Teaching Resources
  • Anthropology
  • Chemistry
  • Physics
  • Sustainability Science

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Press
  • Site Map
  • PNAS Updates

Feedback    Privacy/Legal

Copyright © 2020 National Academy of Sciences. Online ISSN 1091-6490