Bayesian Knowledge Tracing, in 800 words

Dr. P. Iyer

ML lead

8 min read May 16, 2026

Bayesian Knowledge Tracing — BKT — is the workhorse model behind the "mastery" number you see next to every topic in Prep. This post explains it in plain English, with one piece of math.

1 · What we want to estimate

For each (student, topic) pair, we want a number in [0, 1] that means "probability this student knows this topic well enough to get a fresh question right". Call it θ.

θ should go up when the student answers correctly, go down when they answer wrong, and grow more confident (lower σ) the more data we have.

2 · Two states + four parameters

Classical BKT models the student as being in one of two latent states: "knows the skill" (K=1) or "does not know" (K=0). We never observe K directly — we only observe attempts.

Four parameters connect attempts to K:

· p(L0) — prior probability the student starts in K=1 · p(T) — probability of transitioning K=0 → K=1 after a learning event · p(G) — probability of guessing the right answer when K=0 · p(S) — probability of slipping (wrong answer) when K=1

Per-topic these are calibrated from the global cohort data (typical values: p(L0)=0.2, p(T)=0.15, p(G)=0.25 for 4-option MCQs, p(S)=0.1).

3 · The Bayesian update

After observing an attempt, we update p(K=1 | observation) via Bayes:

# Observation = correct
P(K=1 | correct) = P(correct | K=1) * P(K=1) / P(correct)
                 = (1 - p_S) * P(K=1) /
                   ((1 - p_S) * P(K=1) + p_G * (1 - P(K=1)))

# Then apply the learning-event transition:
P(K=1 after) = P(K=1 | obs) + (1 - P(K=1 | obs)) * p_T

# Observation = wrong
P(K=1 | wrong) = p_S * P(K=1) /
                 (p_S * P(K=1) + (1 - p_G) * (1 - P(K=1)))
P(K=1 after) = P(K=1 | obs) + (1 - P(K=1 | obs)) * p_T

Read those slowly. The first line is just "what is the chance you know it, given that you got it right?". The second line bumps it up a notch to account for the fact that just attempting the question taught you something. Same shape for "wrong", but the numerator changes.

4 · What we report as θ

θ = P(K=1) after the update. It naturally lives in [0, 1]. We also track σ — the standard deviation of θ across the posterior. New users have high σ; users with many attempts have low σ. We display σ as the width of the band on each topic's mastery bar.

5 · Why we blend in Elo on top

Pure BKT gives a per-(student, topic) skill. It does not know that a given QUESTION is harder than another. Elo handles that: each question has an elo_rating, each student has a per-topic elo, and the expected-correct probability for an (s, q) pair is the standard logistic.

We use BKT for the "do you know this topic" question and Elo for the "is this question at the right difficulty for this student" question. Both feed the next-question chooser (AIPI-02 / DSC-02). They are not competing; they answer different questions.

6 · Calibration

Every week we re-fit the four BKT parameters per topic, using the previous week's actual outcomes. If the model is overconfident (e.g., predicts θ=0.8 but cohort accuracy is 60%), the parameters drift. Auto-alerts fire when calibration error > 10%; the model is auto-flagged for review and falls back to the prior week's parameters until a human approves the new ones.

Read more in docs/features/41_ai_prep_intelligence.md (AIPI-01) and the SQL schema in db/migrations/0008_ai_analysis.sql (table analytics.topic_mastery).

"The goal of a learning model isn't to be the most sophisticated. It's to be wrong in known, fixable ways."

Bayesian Knowledge Tracing in 800 words