SM-2 spaced repetition, explained for exam-prep

Dr. P. Iyer

ML lead

11 min read May 14, 2026

If you have ever used Anki, you have used SM-2. It is the spaced-repetition algorithm that schedules every flashcard you ever marked "again", "hard", "good", or "easy". Despite being over four decades old, SM-2 still runs underneath nearly every serious spaced-repetition app — including Prep's. This post explains why, what it actually does, and where it falls short.

1 · The forgetting curve

In 1885, Hermann Ebbinghaus discovered that human memory follows a predictable decay. After learning a fact, you forget about 50% of it in the first hour, then the decay slows. By day 2 you remember roughly 30%. By day 7, maybe 15%. The exact numbers vary per person, but the shape is universal.

The fix is spaced review. Every time you successfully recall the fact just before forgetting it, you reset the curve — and crucially, the slope of the next decay is gentler. After 3-4 well-timed reviews, a fact can stick for months.

2 · What SM-2 actually computes

SM-2 maintains three numbers per (user, question) pair:

· An ease factor (EF), starting at 2.5. Higher means "this user finds this card easy". · An interval (I) in days — how long to wait before showing the card again. · A repetition count (n) — how many times in a row the user has answered correctly.

After each attempt, the user grades themselves 0-5 ("complete blackout" to "perfect, no hesitation"). SM-2 then updates:

if grade >= 3:
  if n == 0: I = 1
  elif n == 1: I = 6
  else: I = round(I_prev * EF)
  n += 1
else:
  n = 0
  I = 1   # reset; show tomorrow

EF = max(1.3, EF + (0.1 - (5-grade)*(0.08 + (5-grade)*0.02)))

That is it. A bit of arithmetic, a stable per-card state, and you get a remarkably effective scheduler.

3 · Why we chose SM-2 over FSRS, Leitner, or "AI-native" schedulers

FSRS (Free Spaced Repetition Scheduler) is newer (2022+), uses optimisation rather than fixed formulas, and benchmarks better in lab conditions. Leitner is older and simpler. There are also several "ML-native" experiments where a neural net learns review timing per user.

We picked SM-2 because:

1. It is explainable. We can show the student "this card is due tomorrow because your last 3 attempts went well, ease factor 2.7". Try doing that with a neural net. 2. It is deterministic. The same attempt history always produces the same next-due date. This matters for analytics + reproducible testing. 3. It is cheap. Updating SM-2 state is 10 lines of SQL inside the same transaction as the answer insert. No model serving required. 4. The marginal gain from FSRS is small for exam-prep. FSRS shines when you have years of data per card; we are usually shipping a 12-week prep cycle.

4 · How it lives in the schema

See content.user_question_memory in db/migrations/0003_content_resources.sql. The columns are exactly the SM-2 state: ease_factor, interval_days, reps, last_grade, due_at. The update happens inside a single transaction with the attempt insert (see internal/server/sessions.go) so we never have a half-state on a crashed write.

5 · Where SM-2 falls short — and what we layer on top

SM-2 treats every card as independent. In reality, a student who is strong on transpiration is probably also strong on stomatal physiology — the topics share concepts. SM-2 has no way to know this.

So we layer Bayesian Knowledge Tracing + topic-level Elo on top (see AIPI-01 in docs/features/41_ai_prep_intelligence.md). SM-2 schedules individual cards; BKT+Elo decide which TOPIC to drill tonight. The two layers compose well: SM-2 keeps your existing learning fresh, BKT+Elo decides what to learn next.

6 · One practical tip

When you grade yourself in spaced-rep apps, be honest about the "again" and "hard" buttons. The algorithm only works if your self-grading reflects reality. The temptation is to mark cards "good" to keep streaks intact — but that just delays the inevitable re-learning when the exam approaches and you discover you actually forgot the card three weeks ago.

In Prep, we additionally watch hint-usage + response time. If you mark "good" but took 28 seconds on a 12-second card, the system silently down-weights the grade. It is harder to fool than honesty would suggest.

"The best spaced-repetition schedule is the one you actually follow. The second best is whatever your app picked when you opened it."

SM-2 spaced repetition explained for exam-prep