Chain of Thought

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

🥹 0. TL;DR

Large language models write fluent explanations even when they’re wrong. Verifying their reasoning usually requires another LLM slow, expensive, and circular.

We needed something different:

A miniature reasoning critic <50 KB trained on synthetic reasoning mistakes, able to instantly detect broken reasoning in much larger models.

The Tiny Critic:

trains on GSM8K-style reasoning traces generated by DeepSeek or Mistral
uses FrontierLens, and Visual Policy Maps (VPMs) to convert reasoning into canonical numerical features
is just a logistic regression with ~30 parameters
runs in microseconds
plugs into any agent
dramatically improves InitAgent, R1-Loops, and research-planning stability

This post tells the full story how we built it, why it works, and what we learned about the shape of reasoning.