Review: What We’ve Learned So Far

Page content

😶‍🌫️ Summary

This post is a quick review of the journey so far.

We’re one third of the way through the Self-Learning Systems (100-part) series, and this checkpoint pulls together the first 33 posts and the research papers that shaped them. The table below lists each post, its place in the series, and the key references it builds on, so you can see how the system and the ideas behind it have evolved since May.

# Thumb Title Summary Reference Papers
33 ✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems ✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems Finishes this third of the series with Tiny Critics: sub-50KB models trained on frontier features that sit beside big LLMs and flag broken reasoning in real time. Self-Evolving Vision-Language Models from Images
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Training Verifiers to Solve Math Word Problems
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Solving math word problems with process- and outcome-based feedback
Let’s Verify Step by Step
NaturalProofs: Mathematical Theorem Proving in Natural Language
Support-vector networks
32 The Nexus Blossom: How AI Thoughts Turn into Habits The Nexus Blossom: How AI Thoughts Turn into Habits Introduces the Nexus graph, where thoughts become nodes, relationships become edges, and repeated patterns harden into habits that guide future reasoning and action.
31 Search–Solve–Prove: building a place for thoughts to develop Search–Solve–Prove: building a place for thoughts to develop Designs a Search–Solve–Prove habitat where reasoning steps can be generated, checked, and iteratively improved, giving Stephanie a structured space to grow and test ideas. Search Self-play: Pushing the Frontier of Agent Capability without Supervision
30 The Space Between Models Has Holes: Mapping the AI Gap The Space Between Models Has Holes: Mapping the AI Gap Defines the “gap” between models where they systematically disagree and builds tools to map, visualize, and eventually exploit those gaps instead of pretending they don’t exist.
29 A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields Pulls ZeroModel, Phōs, SIS, and Learning-from-Learning into a single visual reasoning stack where conversations, metrics, and epistemic fields all share one coherent, inspectable space. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
28 🔦 Phōs: Visualizing How AI Learns and How to Build It Yourself 🔦 Phōs: Visualizing How AI Learns and How to Build It Yourself Uses Phōs as a visual lens on learning itself, showing how gradients, scores, and traces can be turned into intuitive pictures of what Stephanie is doing and why.
27 Episteme: Distilling Knowledge into AI Episteme: Distilling Knowledge into AI Takes all the epistemic machinery cartridges, scores, traces and focuses it on knowledge distillation, turning raw research and usage data into durable, inspectable understanding. Deep reinforcement learning from human preferences
Learning to summarize from human feedback
A General Language Assistant as a Laboratory for Alignment
Training language models to follow instructions with human feedback
Scaling Laws for Neural Language Models
26 🔄 Learning from Learning: Stephanie’s Breakthrough 🔄 Learning from Learning: Stephanie’s Breakthrough Closes the loop on Learning-from-Learning: Stephanie scores conversations for knowledge gain, builds training pairs, and fine-tunes tiny models that specialize in recognizing useful insight.
25 From Photo Albums to Movies: Teaching AI to See Its Own Progress From Photo Albums to Movies: Teaching AI to See Its Own Progress Extends the visual stack from static snapshots to sequences, letting Stephanie track progress over time like flipping from scattered photos to a coherent movie of its learning. Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings
24 Case Based Reasoning: Teaching AI to Learn From itself Case Based Reasoning: Teaching AI to Learn From itself Builds a Case-Based Reasoning layer where Stephanie stores past reasoning episodes as cases and reuses them, so each solved problem becomes a worked example for the next one. Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches
Docs
23 SIS: The Visual Dashboard That Makes Stephanie SIS: The Visual Dashboard That Makes Stephanie Adds SIS, a visual dashboard that shows pipelines, scores, and traces in one place, turning Stephanie from a black box into a system you can actually see and debug. Retrieval-Augmented Generation (RAG) Papers
22 ZeroModel: Visual AI you can scrutinize ZeroModel: Visual AI you can scrutinize Introduces ZeroModel and Visual Policy Maps, encoding policy data into images so decisions can be inspected, compared, and even executed using simple visual logic instead of big models.
21 Everything is a Trace: Stephanie Enters Full Reflective Mode Everything is a Trace: Stephanie Enters Full Reflective Mode Redesigns Stephanie so every action becomes a traceable plan, making it possible to replay, compare, and learn from complete reasoning trajectories instead of isolated calls. Hierarchical Reasoning Model
TOWARDS GENERAL-PURPOSE MODEL-FREE REINFORCEMENT LEARNING
Recurrent Independent Mechanisms
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation
Human-level control through deep reinforcement learning
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Root Mean Square Layer Normalization
Super-efficiency of automatic differentiation for functions defined as a minimum
Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence
20 Layers of thought: smarter reasoning with the Hierarchical Reasoning Model Layers of thought: smarter reasoning with the Hierarchical Reasoning Model Builds a Hierarchical Reasoning Model that looks at full traces, not just final answers, giving Stephanie a way to score the quality of entire chains of thought. Recurrent Independent Mechanisms
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation
Nature Machine Intelligence, 2022
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Root Mean Square Layer Normalization
Super-efficiency of automatic differentiation for functions defined as a minimum
Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks
19 Stephanie Stephanie’s Secret: The Dawn of Reflective AI Introduces the GILD-style learning loop for Stephanie: greedy inference plus distillation, so every solved task becomes data that sharpens the system’s policies and scorers. SCALABLE IN-CONTEXT Q-LEARNING
Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data
Exact phylodynamic likelihood via structured Markov genealogy processes
ClusT3: Information Invariant Test-Time Training
18 The Shape of Thought: Exploring Embedding Strategies with Ollama, HF, and H-Net The Shape of Thought: Exploring Embedding Strategies with Ollama, HF, and H-Net Compares multiple embedding backends on real research data, treating embeddings as different “shapes of thought” and measuring how they affect extraction, scoring, and downstream reasoning. H-Net
AlphaEdit: Null-Space Constrained Model Editing
Self-Refine: Iterative Refinement with Self-Feedback
React: Synergizing Reasoning and Acting in Language Models
Reflexion: An Automatic Framework for Iterative Strategy Refinement
Deep Reinforcement Learning from Human Preferences
Direct Preference Optimization
Tutorial on Energy-Based Learning
GPT-4 Technical Report
BERT: Pre-training of Deep Bidirectional Transformers
Dynamic Context Partitioning for Long Document Understanding
17 Getting Smarter at Getting Smarter: A Practical Guide to Self-Tuning AI Getting Smarter at Getting Smarter: A Practical Guide to Self-Tuning AI Turns self-tuning into a concrete engineering pattern, wiring MRQ, EBT, SVM, and LLM judgments into a modular scoring stack that can be trained, compared, and swapped in and out. Energy-Based Transformers are Scalable Learners and Thinkers
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
16 Epistemic Engines: Building Reflective Minds with Belief Cartridges and In-Context Learning Epistemic Engines: Building Reflective Minds with Belief Cartridges and In-Context Learning Introduces epistemic engines and belief cartridges compact, scored summaries that let Stephanie carry forward what it knows and reason reflectively over its own beliefs. Language Models are Few-Shot Learners
Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata
SimCSE: Simple Contrastive Learning of Sentence Embeddings
Autobiographical event memory and aging: older adults get the gist
Snorkel: Rapid Training Data Creation with Weak Supervision
Concrete Problems in AI Safety
15 Self-Improving AI: A System That Learns, Validates, and Retrains Itself Self-Improving AI: A System That Learns, Validates, and Retrains Itself Assembles the first full self-improving loop: Stephanie scores its own outputs, trains internal scorers like MRQ, and uses those scorers to decide what to read, trust, and retrain on. RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
14 Teaching Tiny Models to Think Big: Distilling Intelligence Across Devices Teaching Tiny Models to Think Big: Distilling Intelligence Across Devices Shows how to distill reasoning skills from big models into tiny local ones, letting phones or edge devices share in the same self-improving behavior as the main system.Order anything LoRD: Aligning Language Models via Embedding Space Distillation
13 Compiling Thought: Building a Prompt Compiler for Self-Improving AI Compiling Thought: Building a Prompt Compiler for Self-Improving AI Treats prompts as programs and introduces a prompt compiler that turns high-level intentions into structured pipelines Stephanie can run, inspect, and improve over time.
12 Thoughts of Algorithms Thoughts of Algorithms Steps back to ask what counts as a “thought” for an AI, sketching how algorithms, traces, and symbolic structures become the raw material of Stephanie’s inner mental life. Self-Adapting Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Deep reinforcement learning from human preferences
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
System-2 Fine-tuning for Robust Integration of New Knowledge
11 Document Intelligence: Turning Documents into Structured Knowledge Document Intelligence: Turning Documents into Structured Knowledge Builds the document intelligence layer that slices papers into sections, extracts structured nuggets, and turns messy PDFs into material that Stephanie can actually reason over. Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
10 Learning to Learn: A LATS-Based Framework for Self-Aware AI Pipelines Learning to Learn: A LATS-Based Framework for Self-Aware AI Pipelines Applies LATS to Stephanie’s pipelines so the system can generate multiple candidate traces, score them, and gradually prefer reasoning strategies that lead to better outcomes. RM-R1: Reward Modeling as Reasoning
“LATS: Language Agent Tree Search”
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Mastering the game of Go with deep neural networks and tree search
GPT-4 Technical Report
9 Dimensions of Thought: A Smarter Way to Evaluate AI Dimensions of Thought: A Smarter Way to Evaluate AI Defines “dimensions of thought” goal-conditioned axes like usefulness, risk, and novelty and builds the scoring infrastructure that lets Stephanie rate itself along many dimensions at once. GenSim: Generating Robotic Simulation Tasks via Large Language Models
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
8 Programming Intelligence: Using Symbolic Rules to Steer and Evolve AI Programming Intelligence: Using Symbolic Rules to Steer and Evolve AI Re-introduces symbolic rules as first-class citizens, wiring a rules engine into Stephanie so we can steer, constrain, and gradually evolve behavior alongside learned models. Representation Engineering: A Top-Down Approach to AI Transparency
The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
7 Adaptive Reasoning with ARM: Teaching AI the Right Way to Think Adaptive Reasoning with ARM: Teaching AI the Right Way to Think Uses ARM-style adaptive reasoning to break problems into steps, evaluate each step, and teach Stephanie to adjust its thinking strategy mid-trajectory instead of only at the end. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Training language models to follow instructions with human feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
6 A Novel Approach to Autonomous Research: Implementing NOVELSEEK with Modular AI Agents A Novel Approach to Autonomous Research: Implementing NOVELSEEK with Modular AI Agents Implements a NOVELSEEK-style autonomous research loop where modular agents search papers, generate ideas, and evolve research directions without manual babysitting. InternAgent: When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification
WizardLM: Empowering large pre-trained language models to follow complex instructions
5 Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia The Self-Aware Pipeline: Empowering AI to Choose Its Own Path to the Goal Introduces a self-aware pipeline that treats each run as a plan, letting Stephanie choose which stages to execute, skip, or repeat based on the goal and feedback.I HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Devil’s Advocate: Anticipatory Reflection for LLM Agents
Symbolic Learning Enables Self-Evolving Agents
4 Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia General Reasoner: The smarter Local Agent Turns scattered tools into a single General Reasoner that can pick goals, choose appropriate skills, and coordinate local LLMs into a reusable, task-agnostic problem solver. General-Reasoner: Advancing LLM Reasoning Across All Domains
3 Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia Builds a chain-of-thought “encyclopedia” agent that collects, scores, and reuses reasoning traces so local models learn not just answers, but better ways of thinking. The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
2 Self-Improving Agents: Applying the Sharpening Framework to Local LLMs Self-Improving Agents: Applying the Sharpening Framework to Local LLMs Adapts the Sharpening Framework to small, local LLMs so agents can repeatedly critique, refine, and re-run their own outputs instead of relying on a larger remote model. Self-Improvement in Language Models: The Sharpening Mechanism
Towards General-Purpose Model-Free Reinforcement Learning
1 Building an AI Co-Scientist Building an AI Co-Scientist Defines Stephanie as an AI co-scientist and builds the first end-to-end loop that can read papers, propose experiments, and help a human push a research idea forward. Towards an AI Co-Scientist

Over these first 33 posts, Stephanie has gone from a single AI co-scientist prototype to a sprawling, self-improving ecosystem. We started with one local model reading papers and helping with experiments, then layered on better reasoning (CoT, ARM, LATS), a smarter pipeline that can choose its own path, and a General Reasoner that turns scattered tools into a reusable thinking system.

From there, the series shifted toward epistemic structure and scoring. We built belief cartridges and epistemic engines, MRQ/EBT/SVM scoring stacks, and Hierarchical Reasoning Models that look at full traces instead of isolated answers. Stephanie learned to rate its own thoughts along many dimensions, train tiny specialist models, and use GILD-style loops to get smarter at getting smarter.

The most recent posts focused on visibility and habitats for thought. ZeroModel and Phōs turned policy data and learning signals into images. SIS, CBR, PACS, and the visual reasoning stack gave us dashboards, case libraries, progress movies, and epistemic fields. GAP, Search–Solve–Prove, Nexus, and Tiny Critics then pushed into the space between models mapping disagreements, giving thoughts a graph to live in, and adding lightweight critics that can call out bad reasoning in real time.


🚀 Where We’re Going Next

The first third of this series was about building the inside of Stephanie:
scorers, traces, critics, Nexus graphs, Visual Policy Maps, and the first sketches of an entelechial system — a will encoded into code and models.

From here on, the work turns outward.

The next phase will focus on practical applications of that architecture:

  • Code & System Guardians
    Tiny Critics, VPM risk badges, and ZeroModel maps that can sit over real codebases and services, spotting bad reasoning, fragile areas, and AI-generated landmines before they ship.

  • Living Research Assistants
    Always-on pipelines that watch your work, mine your past conversations, pull in new papers, and come back with “you’re looping here” or “here’s a better approach” without needing to be asked.

  • Cognitive Dashboards
    Visual, trace-driven views (VPMs, Phōs, SIS) that let humans see what the system is doing — where it’s confident, where it’s blind, and how its policies are evolving over time.

  • Shared Entelechy Graphs
    A living knowledge graph of “wills in code”: reusable goal-modules with clear attributes, quality scores, and interfaces, so other people (and other AIs) can plug Stephanie’s skills into their own work.

In other words:
the experimental lab phase is over.
The next posts will show how this new medium behaves in the real world
not just as an idea, but as tools and systems people can actually touch.