Personal project
Swotta
A revision system for GCSE and A-Level students that takes spaced repetition, mastery tracking, and AI tutoring seriously.
Type
Revision and AI tutoring platform
Stack
Next.js 15, PostgreSQL + pgvector, Claude API
Tests
1,605 tests across 84 files
Code
Open source (Polyform Noncommercial)
Why this project
Why the context problem matters more than the chat interface.
I'm interested in how memory works — the biological kind and the computational kind. How spaced repetition exploits the forgetting curve. How retrieval practice strengthens recall more than re-reading ever does. How confidence calibration (knowing what you don't know) is arguably the skill most students lack and most revision tools ignore. Swotta is where those interests meet an agentic AI problem: how you give an AI enough structured context about a student — what they know, what they've forgotten, where they're miscalibrated — that it can do something genuinely useful in the moment rather than just responding to a prompt.
What existing tools get wrong
- Most revision tools track time spent, not mastery gained.
- Students don't know what they don't know — confidence miscalibration is the core problem most tools ignore.
- AI tutoring without structured context is just a chatbot with a study-themed prompt.
- Spaced repetition research is well-established but poorly applied in real products.
Design constraints
- The system had to model a full UK exam specification as a relational topic graph with prerequisite edges.
- AI sessions needed structured context — mastery state, misconceptions, learning preferences, student materials — not just a blank prompt.
- The scheduling engine had to factor exam proximity, topic weights, and behavioural signals, not just track overdue items.
- Multi-tenant identity needed to support both families and schools from the same schema.
What I built
Structured context assembly, not a chatbot with a study prompt.
Scheduling engine
Modified SM-2 spaced repetition factoring exam proximity, topic weights from the actual specification, avoidance patterns, and confidence miscalibration. Picks both what to study and how — across 10 distinct session types.
AI study sessions
Conversational sessions powered by Claude. Each session receives the student's mastery level, known misconceptions, learning preferences, relevant chunks from their own materials (pgvector similarity search), and the qualification's mark scheme structure.
Source ingestion
Upload PDFs of class notes or past papers. Pipeline extracts text, chunks at semantic boundaries, generates embeddings, and classifies each chunk against the curriculum topic graph.
Parent reporting
Weekly reports with mastery changes, misconception narratives, confidence calibration insights, and behavioural patterns. Reports that tell parents something useful, not just "studied for 3 hours."
Multi-tenant identity
Household-as-organisation model supporting B2C families and B2B schools from the same schema, with policies resolving through five layers.
Systems involved
More than a chatbot with a curriculum prompt.
Core systems
- Curriculum topic graph with prerequisite edges (40+ tables across 5 schema layers)
- Modified SM-2 scheduling engine with exam-aware prioritisation
- AI context assembly and 15 Markdown prompt templates
- pgvector similarity search for student materials
- Inngest background jobs for ingestion, reporting, and scheduling
- Terraform-managed GCP infrastructure (Cloud Run, Cloud SQL, europe-west2)
Why it was non-trivial
- The hard part wasn't the chat interface — it was context assembly: making each AI session genuinely useful by giving it structured information about the student rather than a generic prompt.
- The scheduling engine doesn't just track what's overdue. It models exam proximity, topic weights, avoidance patterns, and confidence miscalibration to decide both what to study and how.
- The system ingests a student's actual materials, classifies them against the curriculum, and retrieves relevant chunks during sessions — no generic content.