AI Safety

Hallucination Energy: A Geometric Foundation for Policy-Bounded AI

🚀 Summary

This post presents the current research draft and implementation of a geometric framework for bounding stochastic language models through deterministic policy enforcement.

The central contribution is a scalar metric termed Hallucination Energy, defined as the projection residual between a claim embedding and the subspace spanned by its supporting evidence embeddings. This metric operationalizes grounding as a measurable geometric quantity.

We proceed in three stages:

Formal Definition a draft manuscript introducing Hallucination Energy, its mathematical formulation, and its role within a policy-controlled architecture.
Empirical Evaluation structured calibration and adversarial stress testing across multiple domains to assess the robustness and limits of projection-based grounding.
Applied Validation large-scale evaluation on 10,000 samples from the HaluEval summarization benchmark, demonstrating that projection-based containment functions as a strong first-order grounding signal in a real generative setting.

This work does not claim to solve hallucination. Rather, it characterizes the boundary of projection-based grounding, establishes its suitability as a deterministic policy scalar, and documents both its strengths and its structural limitations.

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

🥹 0. TL;DR

Large language models write fluent explanations even when they’re wrong. Verifying their reasoning usually requires another LLM slow, expensive, and circular.

We needed something different:

A miniature reasoning critic <50 KB trained on synthetic reasoning mistakes, able to instantly detect broken reasoning in much larger models.

The Tiny Critic:

trains on GSM8K-style reasoning traces generated by DeepSeek or Mistral
uses FrontierLens, and Visual Policy Maps (VPMs) to convert reasoning into canonical numerical features
is just a logistic regression with ~30 parameters
runs in microseconds
plugs into any agent
dramatically improves InitAgent, R1-Loops, and research-planning stability

This post tells the full story how we built it, why it works, and what we learned about the shape of reasoning.