Review: What We’ve Learned So Far

😶‍🌫️ Summary

This post is a quick review of the journey so far.

We’re one third of the way through the Self-Learning Systems (100-part) series, and this checkpoint pulls together the first 33 posts and the research papers that shaped them. The table below lists each post, its place in the series, and the key references it builds on, so you can see how the system and the ideas behind it have evolved since May.

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

🥹 0. TL;DR

Large language models write fluent explanations even when they’re wrong. Verifying their reasoning usually requires another LLM slow, expensive, and circular.

We needed something different:

A miniature reasoning critic <50 KB trained on synthetic reasoning mistakes, able to instantly detect broken reasoning in much larger models.

The Tiny Critic:

  • trains on GSM8K-style reasoning traces generated by DeepSeek or Mistral
  • uses FrontierLens, and Visual Policy Maps (VPMs) to convert reasoning into canonical numerical features
  • is just a logistic regression with ~30 parameters
  • runs in microseconds
  • plugs into any agent
  • dramatically improves InitAgent, R1-Loops, and research-planning stability

This post tells the full story how we built it, why it works, and what we learned about the shape of reasoning.

Search–Solve–Prove: building a place for thoughts to develop

Search–Solve–Prove: building a place for thoughts to develop

🌌 Summary

What if you could see an AI think not just the final answer, but the whole stream of reasoning: every search, every dead end, every moment of insight? We’re building exactly that: a visible, measurable thought process we call the Jitter. This post the first in a series shows how we’re creating the habitat where that digital thought stream can live and grow.

We’ll draw on ideas from:

The Space Between Models Has Holes: Mapping the AI Gap

The Space Between Models Has Holes: Mapping the AI Gap

🌌 Summary

What if the most valuable insights in AI evaluation aren’t in model agreements, but in systematic disagreements?

This post reveals that the “gap” between large and small reasoning models contains structured, measurable intelligence about how different architectures reason. We demonstrate how to transform model disagreements from a problem into a solution, using the space between models to make tiny networks behave more like their heavyweight counterparts.

We start by assembling a high-quality corpus (10k–50k conversation turns), score it with a local LLM to create targets, and train both HRM and Tiny models under identical conditions. Then we run fresh documents through both models, collecting not just final scores but rich auxiliary signals (uncertainty, consistency, OOD detection, etc.) and visualize what these signals reveal.

A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields

A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields

📝 Summary

We asked a blunt question: Can we see reasoning?
The answer surprised us: Yes, and you can click on it.

This post shows the complete stack that turns AI reasoning from a black box into an editable canvas. Watch as:

  • Your single insight becomes 10,000 reasoning variations
  • Abstract “understanding” becomes visible epistemic fields
  • Manual prompt engineering becomes automated evolution
  • Blind trust becomes visual verification

This isn’t just code it’s a visual way of interacting with AI, where reasoning becomes something you can see, explore, and refine.

🔦 Phōs: Visualizing How AI Learns and How to Build It Yourself

🔦 Phōs: Visualizing How AI Learns and How to Build It Yourself

“The eye sees only what the mind is prepared to comprehend.” Henri Bergson

🔍 We Finally See Learning

For decades, we’ve measured artificial intelligence with numbers loss curves, accuracy scores, reward signals.
We’ve plotted progress, tuned hyperparameters, celebrated benchmarks.

But we’ve never actually seen learning happen.

Not really.

Sure, we’ve visualized attention maps or gradient flows but those are snapshots, proxies, not processes.

What if we could watch understanding emerge not as a number going up, but as a pattern stabilizing across time?
What if reasoning itself left a visible trace?

Episteme: Distilling Knowledge into AI

Episteme: Distilling Knowledge into AI

🚀 Summary

When you can measure what you are speaking about… you know something about it; but when you cannot measure it… your knowledge is of a meagre and unsatisfactory kind. Lord Kelvin

Remember that time you spent an hour with an AI, and in one perfect response, it solved a problem you’d been stuck on for weeks? Where is that answer now? Lost in a scroll of chat history, a fleeting moment of brilliance that vanished as quickly as it appeared. This post is about how to make that moment permanent, and turn it into an intelligence that amplifies everything you do.

🔄 Learning from Learning: Stephanie’s Breakthrough

🔄 Learning from Learning: Stephanie’s Breakthrough

📖 Summary

AI has always been about absorption: first data, then feedback. But even at its best, it hit a ceiling. What if, instead of absorbing inputs, it absorbed the act of learning itself?

In our last post, we reached a breakthrough: Stephanie isn’t just learning from data or feedback, but from the process of learning itself. That realization changed our direction from building “just another AI” to building a system that absorbs knowledge, reflects on its own improvement, and evolves from the act of learning.