| 33 |
 |
✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems |
Finishes this third of the series with Tiny Critics: sub-50KB models trained on frontier features that sit beside big LLMs and flag broken reasoning in real time. |
Self-Evolving Vision-Language Models from Images Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Training Verifiers to Solve Math Word Problems Tree of Thoughts: Deliberate Problem Solving with Large Language Models Solving math word problems with process- and outcome-based feedback Let’s Verify Step by Step NaturalProofs: Mathematical Theorem Proving in Natural Language Support-vector networks |
| 32 |
 |
The Nexus Blossom: How AI Thoughts Turn into Habits |
Introduces the Nexus graph, where thoughts become nodes, relationships become edges, and repeated patterns harden into habits that guide future reasoning and action. |
– |
| 31 |
 |
Search–Solve–Prove: building a place for thoughts to develop |
Designs a Search–Solve–Prove habitat where reasoning steps can be generated, checked, and iteratively improved, giving Stephanie a structured space to grow and test ideas. |
Search Self-play: Pushing the Frontier of Agent Capability without Supervision |
| 30 |
 |
The Space Between Models Has Holes: Mapping the AI Gap |
Defines the “gap” between models where they systematically disagree and builds tools to map, visualize, and eventually exploit those gaps instead of pretending they don’t exist. |
– |
| 29 |
 |
A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields |
Pulls ZeroModel, Phōs, SIS, and Learning-from-Learning into a single visual reasoning stack where conversations, metrics, and epistemic fields all share one coherent, inspectable space. |
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm |
| 28 |
 |
🔦 Phōs: Visualizing How AI Learns and How to Build It Yourself |
Uses Phōs as a visual lens on learning itself, showing how gradients, scores, and traces can be turned into intuitive pictures of what Stephanie is doing and why. |
– |
| 27 |
 |
Episteme: Distilling Knowledge into AI |
Takes all the epistemic machinery cartridges, scores, traces and focuses it on knowledge distillation, turning raw research and usage data into durable, inspectable understanding. |
Deep reinforcement learning from human preferences Learning to summarize from human feedback A General Language Assistant as a Laboratory for Alignment Training language models to follow instructions with human feedback Scaling Laws for Neural Language Models |
| 26 |
 |
🔄 Learning from Learning: Stephanie’s Breakthrough |
Closes the loop on Learning-from-Learning: Stephanie scores conversations for knowledge gain, builds training pairs, and fine-tunes tiny models that specialize in recognizing useful insight. |
– |
| 25 |
 |
From Photo Albums to Movies: Teaching AI to See Its Own Progress |
Extends the visual stack from static snapshots to sequences, letting Stephanie track progress over time like flipping from scattered photos to a coherent movie of its learning. |
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings |
| 24 |
 |
Case Based Reasoning: Teaching AI to Learn From itself |
Builds a Case-Based Reasoning layer where Stephanie stores past reasoning episodes as cases and reuses them, so each solved problem becomes a worked example for the next one. |
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches Docs |
| 23 |
 |
SIS: The Visual Dashboard That Makes Stephanie |
Adds SIS, a visual dashboard that shows pipelines, scores, and traces in one place, turning Stephanie from a black box into a system you can actually see and debug. |
Retrieval-Augmented Generation (RAG) Papers |
| 22 |
 |
ZeroModel: Visual AI you can scrutinize |
Introduces ZeroModel and Visual Policy Maps, encoding policy data into images so decisions can be inspected, compared, and even executed using simple visual logic instead of big models. |
– |
| 21 |
 |
Everything is a Trace: Stephanie Enters Full Reflective Mode |
Redesigns Stephanie so every action becomes a traceable plan, making it possible to replay, compare, and learn from complete reasoning trajectories instead of isolated calls. |
Hierarchical Reasoning Model TOWARDS GENERAL-PURPOSE MODEL-FREE REINFORCEMENT LEARNING Recurrent Independent Mechanisms AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation Human-level control through deep reinforcement learning Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning Root Mean Square Layer Normalization Super-efficiency of automatic differentiation for functions defined as a minimum Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence |
| 20 |
 |
Layers of thought: smarter reasoning with the Hierarchical Reasoning Model |
Builds a Hierarchical Reasoning Model that looks at full traces, not just final answers, giving Stephanie a way to score the quality of entire chains of thought. |
Recurrent Independent Mechanisms AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation Nature Machine Intelligence, 2022 Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning Root Mean Square Layer Normalization Super-efficiency of automatic differentiation for functions defined as a minimum Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks |
| 19 |
 |
Stephanie’s Secret: The Dawn of Reflective AI |
Introduces the GILD-style learning loop for Stephanie: greedy inference plus distillation, so every solved task becomes data that sharpens the system’s policies and scorers. |
SCALABLE IN-CONTEXT Q-LEARNING Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data Exact phylodynamic likelihood via structured Markov genealogy processes ClusT3: Information Invariant Test-Time Training |
| 18 |
 |
The Shape of Thought: Exploring Embedding Strategies with Ollama, HF, and H-Net |
Compares multiple embedding backends on real research data, treating embeddings as different “shapes of thought” and measuring how they affect extraction, scoring, and downstream reasoning. |
H-Net AlphaEdit: Null-Space Constrained Model Editing Self-Refine: Iterative Refinement with Self-Feedback React: Synergizing Reasoning and Acting in Language Models Reflexion: An Automatic Framework for Iterative Strategy Refinement Deep Reinforcement Learning from Human Preferences Direct Preference Optimization Tutorial on Energy-Based Learning GPT-4 Technical Report BERT: Pre-training of Deep Bidirectional Transformers Dynamic Context Partitioning for Long Document Understanding |
| 17 |
 |
Getting Smarter at Getting Smarter: A Practical Guide to Self-Tuning AI |
Turns self-tuning into a concrete engineering pattern, wiring MRQ, EBT, SVM, and LLM judgments into a modular scoring stack that can be trained, compared, and swapped in and out. |
Energy-Based Transformers are Scalable Learners and Thinkers Direct Preference Optimization: Your Language Model is Secretly a Reward Model Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning |
| 16 |
 |
Epistemic Engines: Building Reflective Minds with Belief Cartridges and In-Context Learning |
Introduces epistemic engines and belief cartridges compact, scored summaries that let Stephanie carry forward what it knows and reason reflectively over its own beliefs. |
Language Models are Few-Shot Learners Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata SimCSE: Simple Contrastive Learning of Sentence Embeddings Autobiographical event memory and aging: older adults get the gist Snorkel: Rapid Training Data Creation with Weak Supervision Concrete Problems in AI Safety |
| 15 |
 |
Self-Improving AI: A System That Learns, Validates, and Retrains Itself |
Assembles the first full self-improving loop: Stephanie scores its own outputs, trains internal scorers like MRQ, and uses those scorers to decide what to read, trust, and retrain on. |
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation |
| 14 |
 |
Teaching Tiny Models to Think Big: Distilling Intelligence Across Devices |
Shows how to distill reasoning skills from big models into tiny local ones, letting phones or edge devices share in the same self-improving behavior as the main system.Order anything |
LoRD: Aligning Language Models via Embedding Space Distillation |
| 13 |
 |
Compiling Thought: Building a Prompt Compiler for Self-Improving AI |
Treats prompts as programs and introduces a prompt compiler that turns high-level intentions into structured pipelines Stephanie can run, inspect, and improve over time. |
– |
| 12 |
 |
Thoughts of Algorithms |
Steps back to ask what counts as a “thought” for an AI, sketching how algorithms, traces, and symbolic structures become the raw material of Stephanie’s inner mental life. |
Self-Adapting Language Models ReAct: Synergizing Reasoning and Acting in Language Models Deep reinforcement learning from human preferences Direct Preference Optimization: Your Language Model is Secretly a Reward Model AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models System-2 Fine-tuning for Robust Integration of New Knowledge |
| 11 |
 |
Document Intelligence: Turning Documents into Structured Knowledge |
Builds the document intelligence layer that slices papers into sections, extracts structured nuggets, and turns messy PDFs into material that Stephanie can actually reason over. |
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers |
| 10 |
 |
Learning to Learn: A LATS-Based Framework for Self-Aware AI Pipelines |
Applies LATS to Stephanie’s pipelines so the system can generate multiple candidate traces, score them, and gradually prefer reasoning strategies that lead to better outcomes. |
RM-R1: Reward Modeling as Reasoning “LATS: Language Agent Tree Search” Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models Mastering the game of Go with deep neural networks and tree search GPT-4 Technical Report |
| 9 |
 |
Dimensions of Thought: A Smarter Way to Evaluate AI |
Defines “dimensions of thought” goal-conditioned axes like usefulness, risk, and novelty and builds the scoring infrastructure that lets Stephanie rate itself along many dimensions at once. |
GenSim: Generating Robotic Simulation Tasks via Large Language Models Direct Preference Optimization: Your Language Model is Secretly a Reward Model |
| 8 |
 |
Programming Intelligence: Using Symbolic Rules to Steer and Evolve AI |
Re-introduces symbolic rules as first-class citizens, wiring a rules engine into Stephanie so we can steer, constrain, and gradually evolve behavior alongside learned models. |
Representation Engineering: A Top-Down Approach to AI Transparency The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance Direct Preference Optimization: Your Language Model is Secretly a Reward Model |
| 7 |
 |
Adaptive Reasoning with ARM: Teaching AI the Right Way to Think |
Uses ARM-style adaptive reasoning to break problems into steps, evaluate each step, and teach Stephanie to adjust its thinking strategy mid-trajectory instead of only at the end. |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Training language models to follow instructions with human feedback Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |
| 6 |
 |
A Novel Approach to Autonomous Research: Implementing NOVELSEEK with Modular AI Agents |
Implements a NOVELSEEK-style autonomous research loop where modular agents search papers, generate ideas, and evolve research directions without manual babysitting. |
InternAgent: When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification WizardLM: Empowering large pre-trained language models to follow complex instructions |
| 5 |
 |
The Self-Aware Pipeline: Empowering AI to Choose Its Own Path to the Goal |
Introduces a self-aware pipeline that treats each run as a plan, letting Stephanie choose which stages to execute, skip, or repeat based on the goal and feedback.I |
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages Devil’s Advocate: Anticipatory Reflection for LLM Agents Symbolic Learning Enables Self-Evolving Agents |
| 4 |
 |
General Reasoner: The smarter Local Agent |
Turns scattered tools into a single General Reasoner that can pick goals, choose appropriate skills, and coordinate local LLMs into a reusable, task-agnostic problem solver. |
General-Reasoner: Advancing LLM Reasoning Across All Domains |
| 3 |
 |
Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia |
Builds a chain-of-thought “encyclopedia” agent that collects, scores, and reuses reasoning traces so local models learn not just answers, but better ways of thinking. |
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think |
| 2 |
 |
Self-Improving Agents: Applying the Sharpening Framework to Local LLMs |
Adapts the Sharpening Framework to small, local LLMs so agents can repeatedly critique, refine, and re-run their own outputs instead of relying on a larger remote model. |
Self-Improvement in Language Models: The Sharpening Mechanism Towards General-Purpose Model-Free Reinforcement Learning |
| 1 |
 |
Building an AI Co-Scientist |
Defines Stephanie as an AI co-scientist and builds the first end-to-end loop that can read papers, propose experiments, and help a human push a research idea forward. |
Towards an AI Co-Scientist |