✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems
🥹 0. TL;DR
Large language models write fluent explanations even when they’re wrong. Verifying their reasoning usually requires another LLM slow, expensive, and circular.
We needed something different:
A miniature reasoning critic <50 KB trained on synthetic reasoning mistakes, able to instantly detect broken reasoning in much larger models.
The Tiny Critic:
- trains on GSM8K-style reasoning traces generated by DeepSeek or Mistral
- uses FrontierLens, and Visual Policy Maps (VPMs) to convert reasoning into canonical numerical features
- is just a logistic regression with ~30 parameters
- runs in microseconds
- plugs into any agent
- dramatically improves InitAgent, R1-Loops, and research-planning stability
This post tells the full story how we built it, why it works, and what we learned about the shape of reasoning.