Chain of Thought

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

🥹 0. TL;DR

Large language models write fluent explanations even when they’re wrong. Verifying their reasoning usually requires another LLM slow, expensive, and circular.

We needed something different:

A miniature reasoning critic <50 KB trained on synthetic reasoning mistakes, able to instantly detect broken reasoning in much larger models.

The Tiny Critic:

  • trains on GSM8K-style reasoning traces generated by DeepSeek or Mistral
  • uses FrontierLens, and Visual Policy Maps (VPMs) to convert reasoning into canonical numerical features
  • is just a logistic regression with ~30 parameters
  • runs in microseconds
  • plugs into any agent
  • dramatically improves InitAgent, R1-Loops, and research-planning stability

This post tells the full story how we built it, why it works, and what we learned about the shape of reasoning.