The “Negative Contrast Trap”: Why AI Writing Overuses “Not X, But Y”

The “Negative Contrast Trap”: Why AI Writing Overuses “Not X, But Y”

Read enough AI prose and a rhythm starts to appear. Not fear. Not relief. Not strategy.Once you see it, you cannot unsee it.

🧠 Abstract

Large language models frequently produce rhetorical constructions such as “not fear, but relief” or “not intelligence, but memory.” While these patterns exist in human writing, AI systems tend to overproduce them, creating repetitive and unnatural prose. This article identifies the phenomenon as the Negative Contrast Trap, explains why it emerges from statistical language modeling, and proposes practical methods to detect and mitigate it in AI-assisted writing systems.

Applied Policy: How to incorporate Policy and Hallucination in self-improving system

Building a Self-Improving AI: Cooperative ERL and Embed-RL in a Trace-Native Architecture

1. The Problem

Most self-improving AI systems fail for one of three reasons:

First, scalar reward collapse. Traditional reinforcement learning compresses multi-dimensional quality into a single scalar. This creates catastrophic interference: improving one axis (e.g., coherence) can degrade another (e.g., hallucination safety). The system optimizes for the blended metric, not the underlying objectives.

Second, representation drift. Embedding-based optimization without behavioral feedback creates geometric collapse. The embedding space becomes increasingly narrow, losing discriminative power. Similar queries map to identical regions. Diversity vanishes. The system becomes brittle.

Hallucination Energy: A Geometric Foundation for Policy-Bounded AI

Hallucination Energy: A Geometric Foundation for Policy-Bounded AI

🚀 Summary

This post presents the current research draft and implementation of a geometric framework for bounding stochastic language models through deterministic policy enforcement.

The central contribution is a scalar metric termed Hallucination Energy, defined as the projection residual between a claim embedding and the subspace spanned by its supporting evidence embeddings. This metric operationalizes grounding as a measurable geometric quantity.

We proceed in three stages:

  1. Formal Definition a draft manuscript introducing Hallucination Energy, its mathematical formulation, and its role within a policy-controlled architecture.
  2. Empirical Evaluation structured calibration and adversarial stress testing across multiple domains to assess the robustness and limits of projection-based grounding.
  3. Applied Validation large-scale evaluation on 10,000 samples from the HaluEval summarization benchmark, demonstrating that projection-based containment functions as a strong first-order grounding signal in a real generative setting.

This work does not claim to solve hallucination. Rather, it characterizes the boundary of projection-based grounding, establishes its suitability as a deterministic policy scalar, and documents both its strengths and its structural limitations.

From Evidence to Verifiability: Rebuilding Trust in AI Outputs 🔏

From Evidence to Verifiability: Rebuilding Trust in AI Outputs 🔏

⏰ TLDR

This work shows that the hardest part of using AI in high-trust environments is not the model, but the policy. Once editorial policy is made explicit and executable, AI systems become interchangeable the real challenge is engineering reliable measurements and deterministic enforcement of those policies.

📋 Summary

AI systems are becoming deeply embedded in how we research, write, and reason. At the same time, their use in high-trust environments is under strain not because models are incapable, but because they are being deployed into settings that demand determinism, provenance, and enforceable rules.

Review: What We’ve Learned So Far

😶‍🌫️ Summary

This post is a quick review of the journey so far.

We’re one third of the way through the Self-Learning Systems (100-part) series, and this checkpoint pulls together the first 33 posts and the research papers that shaped them. The table below lists each post, its place in the series, and the key references it builds on, so you can see how the system and the ideas behind it have evolved since May.

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

✨ TINY CRITICS: Lightweight Reasoning Checks for Large AI Systems

🥹 0. TL;DR

Large language models write fluent explanations even when they’re wrong. Verifying their reasoning usually requires another LLM slow, expensive, and circular.

We needed something different:

A miniature reasoning critic <50 KB trained on synthetic reasoning mistakes, able to instantly detect broken reasoning in much larger models.

The Tiny Critic:

  • trains on GSM8K-style reasoning traces generated by DeepSeek or Mistral
  • uses FrontierLens, and Visual Policy Maps (VPMs) to convert reasoning into canonical numerical features
  • is just a logistic regression with ~30 parameters
  • runs in microseconds
  • plugs into any agent
  • dramatically improves InitAgent, R1-Loops, and research-planning stability

This post tells the full story how we built it, why it works, and what we learned about the shape of reasoning.

Search–Solve–Prove: building a place for thoughts to develop

Search–Solve–Prove: building a place for thoughts to develop

🌌 Summary

What if you could see an AI think not just the final answer, but the whole stream of reasoning: every search, every dead end, every moment of insight? We’re building exactly that: a visible, measurable thought process we call the Jitter. This post the first in a series shows how we’re creating the habitat where that digital thought stream can live and grow.

We’ll draw on ideas from:

The Space Between Models Has Holes: Mapping the AI Gap

The Space Between Models Has Holes: Mapping the AI Gap

🌌 Summary

What if the most valuable insights in AI evaluation aren’t in model agreements, but in systematic disagreements?

This post reveals that the “gap” between large and small reasoning models contains structured, measurable intelligence about how different architectures reason. We demonstrate how to transform model disagreements from a problem into a solution, using the space between models to make tiny networks behave more like their heavyweight counterparts.

We start by assembling a high-quality corpus (10k–50k conversation turns), score it with a local LLM to create targets, and train both HRM and Tiny models under identical conditions. Then we run fresh documents through both models, collecting not just final scores but rich auxiliary signals (uncertainty, consistency, OOD detection, etc.) and visualize what these signals reveal.

A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields

A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields

📝 Summary

We asked a blunt question: Can we see reasoning?
The answer surprised us: Yes, and you can click on it.

This post shows the complete stack that turns AI reasoning from a black box into an editable canvas. Watch as:

  • Your single insight becomes 10,000 reasoning variations
  • Abstract “understanding” becomes visible epistemic fields
  • Manual prompt engineering becomes automated evolution
  • Blind trust becomes visual verification

This isn’t just code it’s a visual way of interacting with AI, where reasoning becomes something you can see, explore, and refine.