Benchmarking AI Tutors: From Answer Correctness to Pedagogical Guidance
Why solver-style benchmarks fail for education, what recent tutoring benchmarks show, and how we should evaluate AI Tutors as guidance systems rather than solution engines.
Research blog
Notes on AI agents, evaluation, and project-based research formation at London AI School.
Why solver-style benchmarks fail for education, what recent tutoring benchmarks show, and how we should evaluate AI Tutors as guidance systems rather than solution engines.