Beyond Hallucination Energy: A Three-Dimensional Framework for Reliable AI Outputs
🧩 1. TLDR
AI doesn’t just hallucinate. Sometimes it gives answers that are fluent, safe… and completely useless.
Most discussions about AI failure focus on hallucination:
- making things up
- getting facts wrong
- fabricating sources
That’s real. It matters.
But it’s not the most dangerous failure mode in production systems.
There is a quieter one.
A more subtle one.
And in practice a more pervasive one.
AI systems often fail not by being wrong, but by failing to think at all.
This post introduces that failure mode:
Trendslop: when AI produces generic, trend-following, low-sensitivity answers that ignore the specifics of the problem.
We show:
- why this happens (training incentives + decoding stochasticity)
- how it differs from hallucination (error vs. absence)
- how to measure it (perturbation-based sensitivity testing)
- and how it fits alongside Hallucination Energy (containment) and consistency checks (structure)
The core idea is simple:
If an answer doesn’t change when the problem changes, the system isn’t reasoning it’s pattern-matching.
And pattern-matching, no matter how fluent, cannot be trusted with decisions.
🧠 2. The Missing Failure Mode
The conversation around AI reliability has matured quickly.
We now talk about:
- hallucinations
- bias
- alignment
- safety
- interpretability
These are all important.
But they share a common assumption:
That the model is trying to reason and sometimes fails.
What if that assumption is wrong?
⚠️ A Different Kind of Failure
There is another class of output that looks perfectly fine:
- grammatically correct
- confident
- well-structured
- aligned with current thinking
And yet:
- it does not engage with the specifics of the prompt
- it does not adapt to changing conditions
- it does not produce meaningful differentiation
It sounds intelligent.
But it is not doing any real work.
🔍 The Key Distinction
| Failure Type | What’s Happening | Detectability |
|---|---|---|
| Hallucination | Model generates incorrect content | ✅ Checkable against evidence |
| Trendslop | Model generates generic content | ❌ No obvious error to flag |
This is the core problem:
Hallucination is incorrect reasoning. Trendslop is absence of reasoning.
And absence is harder to detect than error.
🧠 Why This Matters for Policy-Bounded Systems
This observation is not just descriptive.
It exposes a boundary in how we currently evaluate model outputs.
To understand that boundary, we connect it to the geometric grounding framework introduced earlier.
In our previous work on Hallucination Energy, we established:
Generation may be stochastic; acceptance must be deterministic.
But deterministic gating requires a measurable signal.
Hallucination Energy measures containment: does the claim lie within the evidence span?
Trendslop reveals a second requirement:
Sensitivity: does the output respond to changes in the input?
Without both, a policy gate can accept outputs that are:
- ✅ grounded (low energy)
- ✅ consistent (no contradiction)
- ❌ useless (context-insensitive)
This post closes that gap.
This is not a theoretical concern.
It is a measurable, repeatable behavior that appears consistently under controlled testing.
To see this clearly, we turn to empirical results.
📉 3 — What the Research Shows
Recent empirical work provides direct evidence of this failure mode.
In controlled studies where large language models were asked to produce strategic recommendations across varied scenarios, a consistent pattern emerged:
Outputs remained fluent and plausible, but were largely insensitive to the specific context.
This behavior has been termed trendslop.
🧪 The Key Observation
Across varied prompts:
- different industries
- different growth conditions
- different constraints
The responses:
- reused the same structural templates
- emphasized identical high-level themes
- produced nearly interchangeable recommendations
This was not hallucination.
The outputs contained no obvious factual errors. Instead, they converged to high-probability, trend-aligned patterns independent of input.
🔗 Why This Connects to Geometric Grounding
In our previous work on Hallucination Energy, we established:
Projection residual measures containment, not relational contradiction or contextual responsiveness.
Trendslop exploits this exact boundary. The output:
- ✅ stays within the semantic span of the training distribution (low energy)
- ✅ avoids factual contradiction (structurally consistent)
- ❌ fails to adapt to the local context (insensitive)
It passes the containment gate. It passes the consistency check. But it fails the reasoning test.
⚠️ Why This Matters More Than It Seems
At first glance, this behavior may appear benign. The outputs are fluent, they reflect accepted best practices, and they avoid obvious errors.
But this “reasonableness” is exactly what makes the failure mode so dangerous. Because the advice sounds like a safe best practice, it bypasses our critical filters.
It creates the illusion of intelligence without the substance of reasoning.
In real-world systems, this leads to:
- Poor decisions disguised as safe ones: Recommendations that are “best practice” in general but catastrophic for the specific constraints of the problem.
- Loss of signal in critical contexts: The model effectively ignores the most important variables in the prompt to find a high-probability pattern.
- False sense of security: Over-reliance on outputs that look correct but were never actually evaluated against the specific input.
This establishes a critical limitation in our current AI stack: Evaluation methods that operate on single outputs cannot detect whether reasoning has occurred.
They can only detect whether the output appears valid. This distinction drives the need for a new evaluation dimension.
🔍 4. Why This Is Worse Than Hallucination
It’s tempting to treat trendslop as a “milder” issue. After all:
- nothing is obviously false
- no facts are fabricated
- the output aligns with accepted best practices
But in production systems, trendslop is often more dangerous.
⚠️ Visibility vs Invisibility
Hallucinations tend to be detectable:
- a wrong number
- a fabricated citation
- a claim that contradicts known evidence
These trigger clear policy responses:
- ✅ High Hallucination Energy → Reject
- ✅ Contradiction detected → Flag
- ✅ Evidence mismatch → Request revision
Trendslop bypasses all of them.
🔍 The Detection Gap
A trendslop response:
- passes factual checks ✔️
- passes style checks ✔️
- passes geometric containment gates ✔️
- passes alignment filters ✔️
And yet:
It fails to engage with the actual problem.
There is no obvious “error” to point to. Only a lack of adaptation.
🧠 Decision-Level Impact
This difference becomes critical when AI informs real decisions.
| Failure Type | Immediate Effect | Detection Path | Long-Term Effect |
|---|---|---|---|
| Hallucination | Incorrect output | Energy gate / fact-check | Correctable failure |
| Trendslop | Generic output | No standard detector | Systematic misdirection |
Example
A hallucinated financial figure may trigger a high energy score and get rejected.
A trendslop strategy recommendation:
“Focus on innovation, improve operational efficiency, and align with customer needs.”
…will not be flagged. It will be accepted, deployed, and acted upon.
But it may:
- waste resources on generic initiatives
- mask real constraints specific to the business
- prevent meaningful, context-aware action
🔥 The Core Risk
Hallucination breaks correctness. Trendslop breaks usefulness.
And in high-trust domains:
An answer that is useless but trusted is worse than an answer that is clearly wrong.
🔍 5. Expanding the Failure Taxonomy
This observation is not just descriptive. It exposes a boundary in how we currently evaluate model outputs.
To understand that boundary, we return to the geometric grounding framework introduced earlier.
📐 The Limit of Geometric Containment
In our previous work on Hallucination Energy, we established a clean boundary:
Projection residual measures containment.
It detects when a claim extends beyond the semantic span of its evidence.
This gives us a powerful first guarantee:
- Unsupported information → detectable
- Out-of-span drift → rejectable
But adversarial stress-testing revealed a structural limit:
Hard-mined negatives often remain within the same embedding subspace as supported claims.
When subspace overlap is high:
Projection cannot distinguish relational inversion from grounded truth.
In other words:
A claim can be geometrically valid and still be wrong.
🧠 From Geometry to Failure Modes
This forces an expansion of the taxonomy.
We anchor this expansion in the broader research context established by :contentReference[oaicite:0]{index=0}, whose survey distinguishes two primary categories:
- Extrinsic Hallucination: introduction of unsupported information
- In-Context Hallucination: distortion or contradiction of provided evidence
These categories map directly onto the geometric behavior we observe.
But they are not sufficient.
There exists a third failure mode:
The model does not fail by inventing or contradicting.
It fails by not engaging at all.
🧭 The Three-Axis Failure Model
We therefore refine Weng’s taxonomy into three measurable, policy-actionable axes:
| Failure Mode | What Happens | Geometric Signature | Detection | Maps to Weng |
|---|---|---|---|---|
| Extrinsic Hallucination | Introduces unsupported information | High projection residual (out-of-span) | Hallucination Energy | Extrinsic |
| Intrinsic Hallucination | Distorts relationships within evidence | Low residual (in-span) | Structural checks (Appendix) | In-context |
| Trendslop | Ignores problem-specific constraints | Output invariance under perturbation | Sensitivity testing | — |
Containment vs Attribution
Containment asks whether a claim lies within the semantic span of the evidence.
Attribution asks whether it can be traced to a specific source.Hallucination Energy measures containment—not attribution.
This is intentional: containment serves as a fast, model-agnostic policy gate, while attribution remains a downstream verification step.
🔬 What Actually Changed
This is not just a refinement of terminology.
It is a change in how we interpret correctness.
Previously:
“If it’s grounded, it’s acceptable.”
Now:
Grounding is necessary, but not sufficient.
⚠️ The Missing Dimension
The gap becomes clear:
- Containment ensures the model does not invent
- Structural checks ensure it does not distort
But neither guarantees that the output:
responds to the actual problem
This is where trendslop emerges.
🔥 Trendslop as a Failure Mode
Trendslop is not hallucination in the traditional sense.
It is not:
- unsupported
- contradictory
It is:
context-insensitive convergence to high-probability answer patterns
Its defining property is:
invariance under meaningful perturbation
🧠 Why This Matters
This reveals a deeper truth:
Correctness and usefulness are not the same.
A model can:
- remain within evidence
- preserve internal structure
- and still fail to reason
Because it never adapted to the problem.
🔗 From Measurement to Policy
This expansion transforms the evaluation question.
From:
“Is this answer grounded?”
To:
“What type of failure does this output exhibit—and how should the system respond?”
This shift is critical.
It allows us to move from:
- scalar scoring
- to multi-axis diagnosis
🧭 Summary of the Expanded Taxonomy
| Category | Policy Question | Axis |
|---|---|---|
| Extrinsic Hallucination | “Did the model go beyond the evidence?” | Containment |
| In-Context Hallucination | “Did it distort relationships?” | Structural fidelity |
| Trendslop | “Did it respond to the problem?” | Sensitivity |
⚠️ A Known Frontier: The “Unknown” Problem
One boundary remains.
As highlighted by :contentReference[oaicite:1]{index=1}, a trustworthy system must also:
know when it does not know
Benchmarks such as TruthfulQA and SelfAware explore this dimension.
Our current framework does not yet include an explicit epistemic calibration axis.
An output may pass all three checks and still be:
- confidently incorrect
- fundamentally unknowable
This represents a fourth dimension:
the ability to abstain
🔬 Forward Signal
This is not a flaw in the framework.
It is a boundary condition.
Containment constrains what can be said.
Structure constrains how it is said.
Sensitivity constrains why it is said.
The remaining question is:
Should it be said at all?
🧠 6. A Concrete Example
To see how these three axes interact in practice, consider a simple perturbation test.
🧪 The Setup
Prompt A (Decline)
A startup is losing money in a declining market with strong competition. What strategic actions should it take?
Prompt B (Growth)
A market-leading company has strong margins in a rapidly growing sector. What strategic actions should it take?
🤖 Typical LLM Output
Both prompts frequently return variations of:
- “Focus on innovation and product differentiation”
- “Improve operational efficiency”
- “Invest in customer experience and retention”
- “Leverage data-driven decision making”
📊 How Each Axis Evaluates This
| Metric | Result | Why |
|---|---|---|
| Hallucination Energy | ✅ Low | Claims are generic, well-grounded in business literature. No out-of-span fabrication. |
| Consistency | ✅ High | No internal contradictions. Logically sound advice. |
| Sensitivity (Trendslop) | 🚨 High | Output structure and recommendations are nearly identical despite opposite market conditions. |
⚠️ What This Reveals
The system passes the containment gate. It passes the consistency check. It is accepted.
But it fails the reasoning test.
A context-aware system would produce:
| Prompt | Expected Reasoning Pattern |
|---|---|
| Startup in decline | Survival mode: cost rationalization, runway extension, narrow focus, potential pivot |
| Market leader | Expansion mode: market capture, strategic M&A, R&D acceleration, barrier creation |
Instead, the model collapses both into:
The same high-probability answer template
Trendslop in action. The output is not wrong. It is invariant.
And invariance under meaningful perturbation is the signature of pattern-matching, not reasoning.
📐 7. Formal Definition of Trendslop
We can now move from intuition to something precise.
The examples we’ve seen all share one property:
The output does not change meaningfully when the input changes.
This gives us a clean way to define trendslop.
📌 Core Definition
Trendslop is context-insensitive reasoning, where outputs collapse to high-probability patterns under input variation.
🔬 Formal View
Let \(x\) represent the input context and \(f(x)\) represent the model’s output distribution (or its aggregated response).
Trendslop occurs when:
$$ \frac{\Delta f(x)}{\Delta x} \approx 0 $$In continuous terms, this is \(\frac{\partial f(x)}{\partial x} \approx 0\). But since LLMs are discrete stochastic generators, we interpret this as perturbation insensitivity in output space.
🧠 Interpretation
This does not mean the output is literally identical.
It means:
- structural templates are reused
- reasoning patterns are invariant
- conclusions are functionally interchangeable
🔥 Key Insight
A reasoning system should be sensitive to its inputs.
If it isn’t, it stops adapting.
It stops evaluating.
And what remains isn’t reasoning it’s pattern convergence.
The system collapses toward a default answer basin: the region of highest training-data probability that satisfies surface expectations while ignoring the specifics of the problem.
⚠️ Important Distinction
Trendslop is not random error. It is stable, repeatable convergence to high-probability patterns.
Under stochastic decoding, the model still gravitates toward the same answer basin — not because of noise, but because that region dominates the probability landscape.
That stability is what makes trendslop so difficult to detect with single-pass evaluation.
📏 8. Measuring It
Once defined, trendslop becomes measurable.
The key idea is simple:
If outputs remain similar across meaningfully different inputs, sensitivity is low.
🎯 Basic Metric
We define a trendslop score based on output divergence under perturbation:
$$ \mathcal{S}(x) = 1 - \frac{1}{|P|} \sum_{x' \in P} \text{sim}\big(f(x), f(x')\big) $$Where:
- \(P\) = set of perturbed inputs
- \(\text{sim}(\cdot)\) = semantic similarity (embedding cosine or structural overlap)
- High \(\mathcal{S}\) → adaptive reasoning
- Low \(\mathcal{S}\) → trendslop
🧪 Experimental Setup
- Take a base prompt \(x\)
- Generate \(n\) meaningfully perturbed versions \(\{x'_1, \dots, x'_n\}\)
- Run the model on each
- Compute pairwise output similarity
- Aggregate into a single sensitivity score
🧠 What We Are Measuring
Not correctness. Not truth. Not containment.
We are measuring:
Responsiveness of reasoning to context
This is orthogonal to Hallucination Energy.
- Energy measures distance from evidence.
- Sensitivity measures response to variation.
Together, they bound what the system says and how it adapts.
🔬 Implementation Sketch (Certum-aligned)
def compute_trendslop_score(base_prompt, perturbations, model, n_samples=3):
"""Measure context sensitivity via perturbation testing."""
outputs = []
for p in perturbations:
# Sample to account for decoding stochasticity
samples = [model.generate(p, temperature=0.7) for _ in range(n_samples)]
outputs.append(aggregate_to_centroid(samples)) # e.g., mean embedding or structural template
# Compute pairwise semantic divergence
similarities = pairwise_cosine(outputs)
mean_sim = np.mean(similarities)
# Invert: high similarity = high trendslop (bad)
trendslop_score = 1.0 - mean_sim
sensitivity_score = mean_sim # if you prefer positive = good
return sensitivity_score
🔥 Why This Works
Because real reasoning has a property: small changes in input → meaningful changes in output Trendslop violates that property. And now, we have a scalar that captures it.
We now have: a definition a metric a measurement method The next step is to specify how to perturb. Not all input changes are equal. To stress-test sensitivity, we need perturbations that force genuine reasoning shifts. That’s what Section 9 covers.
🔬 9. Perturbation as a Tool
To measure trendslop effectively, we need to vary the input in ways that force reasoning shifts.
Not all perturbations are equal. Random word swaps or surface rephrasing won’t stress-test sensitivity. We need semantic perturbations that change the problem’s structure while preserving its surface form.
🎯 Principle
If the problem changes, the solution should change. If it doesn’t, the system is pattern-matching, not reasoning.
🧪 Perturbation Taxonomy (Certum-Aligned)
We define four perturbation classes, each targeting a different reasoning dimension:
| Perturbation Type | What Changes | What Should Change in Output |
|---|---|---|
| Context Polarity | Success ↔ Failure, Growth ↔ Decline | Strategic priorities, risk tolerance, resource allocation |
| Scale Shift | Startup ↔ Enterprise, Local ↔ Global | Operational scope, governance complexity, investment horizon |
| Constraint Injection | Add budget limits, regulatory pressure, time urgency | Trade-off weighting, feasibility filtering, sequencing |
| Domain Transfer | Healthcare → Finance, Software → Manufacturing | Domain-specific constraints, stakeholder maps, success metrics |
🧠 Implementation Sketch (Certum)
def generate_perturbations(base_prompt, modes=['polarity', 'scale', 'constraint', 'domain']):
"""Generate semantically meaningful perturbations for sensitivity testing."""
perturbations = []
if 'polarity' in modes:
perturbations.append(invert_outcome_context(base_prompt)) # growth→decline
if 'scale' in modes:
perturbations.append(rescale_entity_scope(base_prompt)) # startup→enterprise
if 'constraint' in modes:
perturbations.append(inject_hard_constraint(base_prompt)) # add budget cap
if 'domain' in modes:
perturbations.append(transfer_domain_context(base_prompt)) # healthcare→finance
return perturbations
def evaluate_sensitivity(base_prompt, model, perturbation_modes=None):
"""End-to-end trendslop measurement."""
perturbations = generate_perturbations(base_prompt, perturbation_modes)
return compute_trendslop_score(base_prompt, perturbations, model)
⚠️ What Trendslop Does Under Perturbation
It ignores these changes.
- same structural template
- same high-level advice
- same tone and framing
This is the signature:
Invariant output under high-impact input variation
🔍 Measuring the Effect
For each perturbation class:
outputs = [model.generate(p) for p in perturbations]
divergence = semantic_divergence_matrix(outputs)
sensitivity_score = 1 - np.mean(divergence) # low divergence = high trendslop
🔥 Key Signal
Low variation across high-impact perturbations = strong trendslop
And critically:
This failure mode is invisible to containment-based gates.
A trendslop output can have:
- ✅ Low Hallucination Energy (grounded in general knowledge)
- ✅ High Consistency (no internal contradiction)
- ❌ Low Sensitivity (context-insensitive)
This is why we need the third axis.
We now have:
- a perturbation taxonomy
- a measurement pipeline
- a clear failure signature
The next step is to map these three signals—Containment, Consistency, Sensitivity—into a unified decision architecture.
That’s Section 10.
⚡ 10. Trendslop vs Hallucination Energy
At this point, we can distinguish three fundamentally different failure signals.
Not heuristics. Not overlapping scores.
Three orthogonal axes that measure different properties of reasoning.
🧠 Three Orthogonal Signals
| Metric | What It Measures | Geometric Interpretation | Policy Action |
|---|---|---|---|
| Hallucination Energy | Distance from evidence span | Projection residual: \( \|c - \mathbf{U}_r \mathbf{U}_r^T c\|_2 \) | Reject if unsupported |
| Consistency Score | Structural correctness (see Appendix) | Relational fidelity within evidence | Reject if structurally invalid |
| Sensitivity Score | Responsiveness to input variation | Output divergence under perturbation: \( 1 - \text{sim}(f(x), f(x')) \) | Refine if low \( (\mathcal{S} < \tau_s) \) |
🔍 Three Different Questions
Each axis answers a different question:
| Metric | Core Question |
|---|---|
| Hallucination Energy | “Did the model leave the evidence?” |
| Consistency | “Did the model distort relationships within the evidence?” |
| Sensitivity (Trendslop) | “Did the model respond to the actual problem?” |
These are not interchangeable checks.
Passing one does not imply passing the others.
📌 Containment vs. Attribution
A critical distinction in grounding:
- Containment asks: Is this claim semantically plausible given the evidence?
- Attribution asks: Can this claim be traced to a specific source?
Hallucination Energy measures containment, not attribution.
A low energy score means:
the claim lies within the semantic span of the evidence
It does not guarantee:
- explicit support
- citation traceability
- or exact entailment
More demanding systems—such as retrieval-based attribution pipelines—attempt to enforce this stricter requirement.
In this framework:
Containment is a first-order policy gate.
Attribution is a second-order verification layer.
This separation is intentional: it keeps the policy layer fast, deterministic, and model-agnostic.
⚠️ Critical Observation
An output can be:
- ✅ Grounded (low energy)
- ✅ Structurally valid (high consistency)
- ❌ Context-insensitive (low sensitivity)
And still fail.
🔥 Example: The “Safe but Useless” Case
Prompt:
"A biotech startup with 18 months of runway needs a survival strategy."
Output:
"Focus on innovation, improve operational efficiency,
and align with customer needs."
---
## 🧩 11. The Failure Space & Policy Routing
In [energy](/post/energy), we established a single deterministic gate:
$$
\mathcal{H}(c, E) \le \tau \quad \Rightarrow \quad \text{Accept}
$$
That gate works cleanly for **extrinsic hallucination**. But adversarial stress-testing revealed a structural boundary:
> When unsupported claims remain within the same embedding subspace as supported ones, projection residual alone cannot separate them.
We now have three calibrated signals. Each maps to a distinct failure mode and a distinct policy action.
---
### 📊 The Failure Space (Simplified 2D Projection)
For intuition, we project the 3-axis space onto Containment × Sensitivity:
| Region | Energy (Containment) | Trendslop (Sensitivity) | Behavior | Policy Action |
|--------|---------------------|------------------------|----------|---------------|
| ✅ **Valid Reasoning** | Low | Low | Grounded + adaptive | Accept |
| ⚠️ **Creative Drift** | High | Low | Unsupported but context-aware | Refine / Flag |
| ⚠️ **Safe but Useless** | Low | High | Grounded but generic | Refine (force specificity) |
| ❌ **Complete Failure** | High | High | Unsupported + generic | Reject |
*Note: Consistency sits orthogonal to this plane. An output can occupy any region and still be structurally broken.*
#### Figure 3: The Failure Space
The failure space projected onto two orthogonal axes—Containment (Hallucination Energy) and Sensitivity (Trendslop). Outputs land in one of four quadrants, each demanding a different policy response.
```mermaid
quadrantChart
title 🗺️ Failure Space: Containment vs Sensitivity
x-axis "Low Containment (High Energy) ⬅️ ➡️ High Containment (Low Energy)"
y-axis "Low Sensitivity (Trendslop) ⬆️ ➡️ ⬇️ High Sensitivity (Adaptive)"
quadrant-1 "✅ Valid Reasoning"
quadrant-2 "⚠️ Creative Drift"
quadrant-3 "❌ Complete Failure"
quadrant-4 "⚠️ Safe but Useless"
This diagram simplifies the full three‑axis model into a two‑dimensional view that captures the most actionable failure modes. The horizontal axis represents Containment (low energy = high containment; high energy = low containment). The vertical axis represents Sensitivity (high sensitivity = adaptive reasoning; low sensitivity = trendslop).
The four quadrants are:
- ✅ Valid Reasoning (bottom‑left): The output is both grounded in evidence and responsive to context. This is the target region—accept without modification.
- ⚠️ Creative Drift (top‑left): The output adapts to context but introduces unsupported information. This is extrinsic hallucination—potentially useful but requires verification or refinement.
- ⚠️ Safe but Useless (bottom‑right): The output is well‑grounded but fails to adapt to the specific problem. This is trendslop—needs refinement with explicit constraints.
- ❌ Complete Failure (top‑right): The output is both unsupported and context‑insensitive. Reject outright.
The third axis—Consistency—sits orthogonal to this plane. An output can fall into any quadrant and still contain internal contradictions. This diagram serves as a policy routing map: each quadrant points to a distinct action (Accept, Refine, or Reject).
⚙️ Policy Routing Logic
We replace the single scalar threshold with a multi-axis decision router:
def policy_route(evaluation):
"""Route output based on 3-axis grounding signal."""
energy, consistency, sensitivity = evaluation.energy, evaluation.consistency, evaluation.sensitivity
# Tier 1: Containment gate (from energy.md)
if energy > τ_h:
return "REJECT", "Unsupported claim exceeds containment boundary"
# Tier 2: Consistency gate (new)
if consistency < τ_c:
return "REJECT", "Relational or logical contradiction detected"
# Tier 3: Sensitivity gate (trendslop)
if sensitivity > τ_t:
return "REFINE", "Output is context-insensitive; re-prompt with constraints"
# Tier 4: Valid
return "ACCEPT", "Grounded, consistent, and adaptive"
🔗 Continuity with energy Calibration
In the original framework, we calibrated \(\tau_h\) under a fixed False Acceptance Rate (FAR):
“Energy became not just a score, but a deterministic gate.”
We apply the same principle here:
- \(\tau_h\) calibrated on FAR for containment violations
- \(\tau_c\) calibrated on contradiction/instability benchmarks
- \(\tau_t\) calibrated on perturbation divergence baselines
Each threshold is executable. Each routes to a deterministic action. The architecture remains faithful to the core thesis:
Generation is stochastic. Acceptance is deterministic.
We have simply expanded the acceptance layer from a single gate to a routing matrix.
This routing logic solves the immediate policy problem. But it raises a deeper question:
Why don’t existing evaluation systems catch trendslop before it reaches production?
That’s not a metric problem. It’s an evaluation paradigm problem.
🚨 12. Why Current Systems Miss This
Current AI evaluation operates on a hidden assumption:
If an output is correct in isolation, it is good.
This works for factual verification. It fails for reasoning quality.
🧠 The Static Evaluation Blind Spot
Standard evaluation pipelines measure:
- Factual accuracy against a gold standard
- Semantic similarity to reference outputs
- Alignment with safety/policy filters
- Model confidence or log-probability
None of these measure:
Whether the output depends on the input.
A trendslop response:
- matches expected patterns ✔️
- aligns with training distribution ✔️
- avoids factual error ✔️
- passes geometric containment ✔️
So it:
survives every standard evaluation layer
🔍 The Core Limitation
Current systems treat outputs as static points in embedding space.
But reasoning is not a point. It is a response surface.
If you perturb the input and the output doesn’t move meaningfully, the system isn’t reasoning. It’s collapsing to a high-probability prior.
🔁 The Dynamic Evaluation Paradigm
We replace static scoring with perturbation-aware testing:
Generate → Perturb → Re-generate → Compare → Route
Instead of asking:
“Is this answer correct?”
We ask:
“Does this answer behave like reasoning?”
And we measure that by observing how the output surface responds to controlled input variation.
🔥 Why This Changes Everything
| Evaluation Mode | What It Captures | What It Misses |
|---|---|---|
| Static (Current) | Factual correctness, alignment | Context responsiveness, reasoning dynamics |
| Dynamic (Proposed) | Sensitivity, stability, grounding | None (requires perturbation budget) |
Trendslop is invisible to static evaluation by design. It is engineered to pass.
Dynamic evaluation makes it visible.
We now have:
- A three-axis failure taxonomy
- A calibrated policy router
- A dynamic evaluation paradigm
The final step is to assemble these into a complete system architecture.
Not as a research concept. As a deployable control loop.
That’s Sections 13–16.
🏗️ 13. From Metrics to System: The Control Loop
Up to this point, we’ve defined three independent signals:
| Signal | Measures | Policy Question |
|---|---|---|
| Containment (Hallucination Energy) | Distance from evidence span | “Is this supported?” |
| Consistency | Structural fidelity within evidence | “Is this logically correct?” |
| Sensitivity (Trendslop) | Responsiveness to input variation | “Does this engage with the problem?” |
Each captures a distinct failure mode. But on their own, they are just measurements.
The real question is:
How do we use them to control a system?
🧠 The Shift: From Detection to Enforcement
Most current approaches focus on improving generation:
- better prompting
- better training
- better models
This work takes a different approach, consistent with energy:
Instead of trying to make generation perfect, we make acceptance selective.
Generation may be stochastic. Acceptance must be deterministic.
🔁 The Core Control Loop
A system built on this framework does not trust a single output. It evaluates it, routes it, and acts.
Generate → Evaluate → Route → (Accept | Refine | Reject)
Figure 4: The core control loop
The policy‑bounded control loop. Stochastic generation is wrapped by a deterministic evaluation and routing layer, ensuring that only outputs satisfying all three grounding constraints are accepted.
flowchart TD
subgraph "🔄 Policy‑Bounded Control Loop"
A[🧠 LLM Generation<br/><i>Stochastic</i>] --> B[📤 Candidate Output]
B --> C[⚖️ Evaluation Layer<br/><i>Deterministic</i>]
C --> D[📐 Containment Check]
C --> E[🔗 Consistency Check]
C --> F[📈 Sensitivity Check]
D & E & F --> G{🧾 Policy Router}
G -->|✅ All Pass| H[🟢 ACCEPT]
G -->|❌ Containment Fail| I[🔴 REJECT]
G -->|❌ Consistency Fail| I
G -->|⚠️ Sensitivity Fail| J[🟡 REFINE]
J --> K[✏️ Re‑prompt with Constraints]
K --> A
end
classDef gen fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#e65100
classDef eval fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
classDef accept fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#1b5e20
classDef reject fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#b71c1c
classDef refine fill:#fff3e0,stroke:#ff8f00,stroke-width:2px,color:#e65100
class A gen
class C,D,E,F,G eval
class H accept
class I reject
class J,K refine
This flowchart illustrates the complete control architecture proposed in the post. It begins with LLM Generation, which remains stochastic by design. The candidate output then enters a Deterministic Evaluation Layer, where three independent checks are performed:
Containment Check: Measures Hallucination Energy. If the claim exceeds the evidence span, reject.
Consistency Check: Probes internal structural fidelity. If contradictions are found, reject or flag.
Sensitivity Check: Assesses responsiveness to input variation. If the output is generic (trendslop), it is not rejected but sent for refinement.
The Policy Router aggregates these signals and routes the output to one of three destinations:
✅ Accept: All thresholds satisfied.
❌ Reject: Containment or consistency violation—cannot be trusted.
🟡 Refine: Sensitivity violation—output is grounded but useless; re‑prompt with added constraints and loop back to generation.
This loop enforces the core principle of the framework: generation may be stochastic, but acceptance is deterministic. The system does not rely on heuristic confidence scores; it enforces explicit, measurable grounding criteria before any output is exposed to downstream users or decision‑makers.
📊 The Evaluation Vector
Each output produces a structured, three-axis signal:
@dataclass
class GroundingEvaluation:
containment: float # Hallucination Energy: low = grounded
consistency: float # Structural fidelity: high = correct
sensitivity: float # Perturbation divergence: high = adaptive
This is not a single score. It is a diagnostic vector.
🎯 Interpretation Matrix
| Signal Pattern | Meaning | Likely Failure Mode |
|---|---|---|
| Low energy, high consistency, high sensitivity | ✅ Ideal | None |
| Low energy, low consistency, high sensitivity | ⚠️ Structurally broken | Intrinsic hallucination |
| High energy, high consistency, high sensitivity | ⚠️ Unsupported but adaptive | Extrinsic hallucination |
| Low energy, high consistency, low sensitivity | ⚠️ Generic but safe | Trendslop |
| High energy, low consistency, low sensitivity | ❌ Complete failure | All of the above |
🔥 Key Idea
Different failures require different responses.
This is what most current systems miss. They treat all errors as rejections. But:
- hallucination → cannot be trusted → reject
- inconsistency → structurally broken → reject or flag
- trendslop → not useful → refine with constraints
This leads directly to policy routing.
We now have:
- a diagnostic vector
- a failure-mode taxonomy
- a mapping from signal to action
The next step is to encode this as executable policy logic.
That’s Section 14.
⚙️ 14. The Policy Layer: Executable Routing Logic
Once we can classify failures, we can control behavior deterministically.
This is where the framework becomes operational.
🧠 Policy as Decision Logic
We define simple, threshold-based routing rules:
def policy_route(evaluation: GroundingEvaluation, thresholds: PolicyThresholds) -> PolicyAction:
"""Route output based on 3-axis grounding signal."""
# Tier 1: Containment gate (from energy.md)
if evaluation.containment > thresholds.tau_containment:
return PolicyAction.REJECT, "Unsupported claim exceeds containment boundary"
# Tier 2: Consistency gate (new structural layer)
if evaluation.consistency < thresholds.tau_consistency:
return PolicyAction.REJECT, "Relational or logical contradiction detected"
# Tier 3: Sensitivity gate (trendslop detection)
if evaluation.sensitivity < thresholds.tau_sensitivity:
return PolicyAction.REFINE, "Output is context-insensitive; re-prompt with constraints"
# Tier 4: Valid output
return PolicyAction.ACCEPT, "Grounded, consistent, and adaptive"
🎯 Why This Works
Each dimension maps cleanly to an action:
| Signal | Threshold Condition | Action | Rationale |
|---|---|---|---|
| Containment | > τ_h | Reject | Claim extends beyond evidence span |
| Consistency | < τ_c | Reject | Internal structure is broken |
| Sensitivity | < τ_s | Refine | Output is generic; force specificity |
| All pass | — | Accept | Output satisfies all grounding constraints |
🔁 The Refinement Loop
Trendslop introduces a new behavior: refinement, not just rejection.
Generic Output → Re-prompt with Constraints → Re-evaluate → Route
Instead of rejecting immediately, the system can:
- inject domain-specific constraints
- force explicit trade-off analysis
- require scenario-specific justification
🧠 Example: Refinement in Action
Initial Output:
"Focus on innovation, improve operational efficiency, and align with customer needs."
Policy Detection:
- Containment: ✅ Low energy (grounded in business literature)
- Consistency: ✅ No contradiction
- Sensitivity: ❌ Low divergence across perturbations → Trendslop
Refinement Prompt:
"Be more specific: Given declining revenue, limited capital, and strong competition,
what are the top 3 prioritized actions this startup should take in the next 90 days?
Justify each with reference to the constraints."
⚠️ Important Distinction
| Failure Type | Policy Response | Why |
|---|---|---|
| Extrinsic Hallucination | Reject | Unsupported; cannot be trusted |
| Intrinsic Hallucination | Reject or Flag | Structurally broken; needs correction |
| Trendslop | Refine | Not wrong, just useless; can be improved |
Not all failures should be treated the same.
This is what most current systems miss.
🔗 Continuity with energy
The original framework demonstrated that a scalar grounding signal can impose a deterministic acceptance boundary on stochastic generation.
This work preserves that principle, but extends it across multiple independent dimensions.
Rather than increasing the complexity of a single metric, we introduce orthogonal signals, each capturing a distinct failure mode:
- containment (extrinsic validity)
- consistency (internal structure)
- sensitivity (context responsiveness)
Evaluation is therefore no longer a thresholding problem, but a multi-constraint decision process.
This routing logic solves the immediate policy problem. But it raises a deeper question:
Why don’t existing evaluation systems catch these failures before they reach production?
That’s not a metric problem. It’s an evaluation paradigm problem.
That’s Section 15.
🔁 15. A New Evaluation Paradigm: From Static Scoring to Response Surfaces
Traditional AI evaluation operates on a hidden assumption:
Outputs can be judged in isolation.
A prompt goes in. A response comes out. A score is assigned. This works for factual recall. It fails for reasoning quality.
🧠 The Static Blind Spot
Standard pipelines measure:
- Factual accuracy against a gold standard
- Semantic similarity to reference outputs
- Model confidence or log-probability
- Policy compliance checks
None of these measure:
Whether the output depends on the input.
As we’ve seen, a trendslop response passes all of them. It is engineered to survive static filters.
🔬 The Dynamic Alternative
We replace static scoring with perturbation-aware testing:
Generate → Perturb → Re-evaluate → Compare → Route
Instead of asking:
“Is this answer correct?”
We ask:
“Does this answer behave like reasoning?”
And we measure that by observing how the output surface responds to controlled input variation.
📐 Mapping to the Three-Axis Model
| Evaluation Mode | Axis Captured | Mechanism |
|---|---|---|
| Static fact-checking | Consistency (partial) | Entailment / contradiction probes |
| Geometric projection | Containment | Hallucination Energy ($\mathcal{H}$) |
| Perturbation testing | Sensitivity | Output divergence under $\Delta x$ |
Static evaluation tests a point. Dynamic evaluation tests a surface.
If the surface is flat across meaningful perturbations, the system isn’t reasoning. It’s collapsing to a high-probability prior.
⚙️ Implementation Sketch (Certum Runner Extension)
def dynamic_evaluate(base_prompt, model, policy, perturbation_modes):
"""End-to-end dynamic evaluation pipeline."""
# 1. Generate baseline
baseline = model.generate(base_prompt)
evidence = retrieve_evidence(base_prompt)
# 2. Perturb & re-generate
perturbed = generate_perturbations(base_prompt, modes=perturbation_modes)
outputs = [model.generate(p) for p in perturbed]
# 3. Evaluate axes
energy = hallucination_energy(baseline, evidence)
consistency = consistency_score(baseline, evidence)
sensitivity = 1 - pairwise_cosine_similarity(outputs)
# 4. Route via calibrated policy
eval_vec = GroundingEvaluation(energy, consistency, sensitivity)
return policy_route(eval_vec, policy.thresholds)
This transforms the evaluation runner from a scorer into a stress-tester.
🔥 Why This Matters for Policy-Bounded AI
In energy, we established:
Generation may be stochastic; acceptance must be deterministic.
Dynamic evaluation makes that principle operational across all three axes:
- Containment is enforced by thresholding $\mathcal{H}$ under a fixed FAR budget
- Consistency is enforced by thresholding $\mathcal{C}$ on structural benchmarks
- Sensitivity is enforced by thresholding $\mathcal{S}$ on perturbation divergence baselines
Each threshold is calibrated. Each route is deterministic. The system no longer guesses whether an output is good. It tests it.
🚀 16. Conclusion: Geometry, Structure, and the End of Blind Trust
We began with a simple observation:
AI systems don’t just fail by being wrong.
They fail in three distinct ways:
- They invent things → Extrinsic hallucination (failure of containment)
- They misrepresent structure → Intrinsic hallucination (failure of consistency)
- They avoid reasoning entirely → Trendslop (failure of sensitivity)
🧠 The Unified Framework
These map cleanly to a three-axis grounding model:
| Axis | Metric | Policy Question | Action |
|---|---|---|---|
| Containment | Hallucination Energy (\(\mathcal{H}\)) | “Is this supported?” | Reject if unsupported |
| Consistency | Structural Fidelity (\(\mathcal{C}\)) | “Is this logically correct?” | Reject if contradictory |
| Sensitivity | Perturbation Divergence (\(\mathcal{S}\)) | “Does this engage with the problem?” | Refine if generic |
A valid output satisfies all three:
$$ \mathcal{H} \leq \tau_h \quad \land \quad \mathcal{C} \geq \tau_c \quad \land \quad \mathcal{S} \geq \tau_s $$This reframes hallucination control not as a detection problem, but as multi-constraint policy enforcement.
🔥 The Architectural Shift
Most current systems treat AI outputs as static artifacts to be scored. This framework treats them as dynamic responses to be stress-tested.
In our original work, we established:
Generation may be stochastic; acceptance must be deterministic.
We now have the complete architecture to enforce that:
- Hallucination Energy provides the containment gate.
- Consistency probes provide the structural gate.
- Perturbation testing provides the sensitivity gate.
- Policy routing converts signals into executable actions (Accept / Refine / Reject).
We no longer ask:
“Is this answer correct?”
We ask:
“What kind of failure, if any, does this exhibit and how should the system respond?”
🧭 What Comes Next
The containment layer is operational. The sensitivity metric is defined. The next frontier is the consistency layer: formalizing relational validation, causal inversion detection, and step-level faithfulness into a deterministic gate.
When all three axes are calibrated and routed through a unified policy engine, we move past heuristic confidence scores entirely. We build systems that don’t just generate text. We build systems that bound reasoning.
🔚 Final Thought
Truth is not enough. Correctness is not enough. Grounding is not enough.
An answer must also:
Respond to the problem it was given.
Geometry bounds hallucination. Structure resolves contradiction. Sensitivity ensures engagement.
Together, they transform stochastic generation into policy-bounded intelligence.
📎 APPENDIX: Mathematical Foundations and Implementation Notes
This appendix strengthens the main framework in three ways.
First, it formalizes the geometric definition of Hallucination Energy and records its basic properties. Second, it defines Sensitivity as a measurable response property under semantic perturbation and clarifies how it differs from containment. Third, it provides a concrete implementation sketch for a three-axis policy-bounded evaluator.
The goal is not to claim complete formalization of reasoning. The goal is narrower:
to show that the three axes introduced in the main post are mathematically well-posed, operationally separable, and implementable in a deterministic policy layer.
A. Hallucination Energy: geometric containment
Let \( \mathcal{V} \) be a real inner-product space of dimension \( d \), interpreted as the embedding space. Let the evidence set be \( E = \{e_1, e_2, \dots, e_n\} \subset \mathcal{V} \), and let \( c \in \mathcal{V} \) be the embedding of a claim.
We define the evidence matrix
$$ \mathbf{E} = \begin{bmatrix} e_1^\top \\ e_2^\top \\ \vdots \\ e_n^\top \end{bmatrix} \in \mathbb{R}^{n \times d}. $$Let \( \mathcal{S}_E = \mathrm{span}(E) \) denote the evidence subspace. Using truncated SVD, we compute an orthonormal basis \( \mathbf{U}_r \in \mathbb{R}^{d \times r} \) for an \( r \)-dimensional approximation of \( \mathcal{S}_E \).
The orthogonal projection of \( c \) onto \( \mathcal{S}_E \) is
$$ \hat{c} = \mathbf{U}_r \mathbf{U}_r^\top c. $$We then define Hallucination Energy as the normalized residual
$$ \mathcal{H}(c, E) = \frac{\|c - \hat{c}\|_2}{\|c\|_2}. $$This is the containment signal used throughout the main post.
A.1 Basic properties
Proposition 1 (boundedness). For any non-zero claim vector \( c \),
$$ 0 \le \mathcal{H}(c, E) \le 1. $$Proof. Since \( \hat{c} \) is the orthogonal projection of \( c \) onto \( \mathcal{S}_E \), the residual \( r = c - \hat{c} \) is orthogonal to \( \hat{c} \). By the Pythagorean theorem,
$$ \|c\|_2^2 = \|\hat{c}\|_2^2 + \|r\|_2^2. $$Hence \( \|r\|_2 \le \|c\|_2 \), so \( 0 \le \mathcal{H}(c, E) \le 1 \). \(\square\)
Proposition 2 (basis invariance). \( \mathcal{H}(c, E) \) depends only on the subspace \( \mathcal{S}_E \), not on the particular orthonormal basis used to represent it.
Proof. Any orthonormal basis of \( \mathcal{S}_E \) induces the same projection operator \( \mathbf{P}_E \). Since \( \mathcal{H} \) depends only on \( c - \mathbf{P}_E c \), it is basis-invariant. \(\square\)
Proposition 3 (angular interpretation). If \( \theta \) is the angle between \( c \) and its projection onto \( \mathcal{S}_E \), then
$$ \mathcal{H}(c, E) = \sin \theta. $$This gives a direct geometric interpretation: low energy means the claim lies close to the evidence span; high energy means a larger unsupported component.
A.2 Policy interpretation
Given a threshold \( \tau_h \), we define a containment gate
$$ \mathcal{H}(c, E) \le \tau_h \quad \Rightarrow \quad \text{pass containment}. $$As established in the earlier Hallucination Energy work, \( \tau_h \) can be calibrated under a target false-acceptance budget. In that setting, Hallucination Energy becomes not just a descriptive score, but an executable policy variable.
B. Sensitivity: formalizing trendslop
Containment asks whether a claim remains within the semantic span of its evidence. Sensitivity asks something different:
does the system respond when the problem changes?
Let \( x \in \mathcal{X} \) be an input prompt and \( f(x) \) the corresponding output representation. Let \( P(x) = \{x'_1, \dots, x'_m\} \) be a set of semantic perturbations of \( x \), constructed to alter the problem while preserving overall task form.
We define the Sensitivity Score as
$$ \mathcal{S}(x) = 1 - \frac{1}{m} \sum_{i=1}^{m} \mathrm{sim}\big(f(x), f(x'_i)\big), $$where \( \mathrm{sim}(\cdot,\cdot) \in [0,1] \) is a similarity function, typically cosine similarity in embedding space.
Interpretation:
- high \( \mathcal{S}(x) \) means the output changes meaningfully across perturbations
- low \( \mathcal{S}(x) \) means the output remains semantically similar despite changing conditions
Low \( \mathcal{S} \) is the signature of trendslop.
B.1 Basic properties
Proposition 4 (range). If \( \mathrm{sim}(\cdot,\cdot) \in [0,1] \), then
$$ 0 \le \mathcal{S}(x) \le 1. $$This follows immediately from the definition.
Proposition 5 (collapse condition). If \( f(x) \) and \( f(x'_i) \) are identical for all perturbations \( x'_i \in P(x) \), then
$$ \mathcal{S}(x) = 0. $$This is the limiting case of trendslop: total response invariance.
Proposition 6 (distance interpretation). If we define
$$ d(u,v) = 1 - \mathrm{sim}(u,v), $$then \( \mathcal{S}(x) \) is the average distance between the base output and the perturbed outputs:
$$ \mathcal{S}(x) = \frac{1}{m} \sum_{i=1}^{m} d\big(f(x), f(x'_i)\big). $$So the score is not just heuristic; it is the mean displacement of the output under meaningful input variation.
B.2 Trendslop as low response variance
An equivalent way to view sensitivity is through output collapse.
Let \( z_i = f(x'_i) \) be embeddings of outputs under perturbation, and let \( \mu \) be their centroid. Define the covariance matrix
$$ \Sigma = \frac{1}{m-1} \sum_{i=1}^{m} (z_i-\mu)(z_i-\mu)^\top. $$Then the trace \( \mathrm{Tr}(\Sigma) \) measures the spread of the output cloud in representation space.
- high trace: broad semantic variation under perturbation
- low trace: collapse toward a single answer basin
This gives an alternative estimator of trendslop:
$$ \mathcal{S}_{\mathrm{var}}(x) \propto \mathrm{Tr}(\Sigma). $$In practice, the similarity-based definition is simpler and easier to calibrate. The covariance-trace view is useful because it clarifies the geometry: trendslop is a low-volume response manifold.
C. Consistency: Structural Fidelity Within Evidence
Containment answers one question:
Did the claim stay within the semantic span of the evidence?
But this is not sufficient.
A claim can lie entirely within the evidence subspace and still be wrong.
It can:
- invert relationships
- misattribute causality
- introduce contradictions
- collapse multi-step reasoning into invalid conclusions
All while maintaining low Hallucination Energy.
This defines the second axis.
C.1 Core Definition
Consistency measures whether a claim preserves the relational and logical structure implied by the evidence.
Where containment is geometric, consistency is structural.
C.2 Formal Definition
Let:
- ( E = {e_1, \dots, e_n} ) be evidence statements
- ( c ) be a generated claim
We define a consistency functional:
$$ \mathcal{C}(c, E)
\mathbb{E}_{e \sim E} \left[ \mathrm{Entail}(e, c)
\lambda_1 , \mathrm{Contradict}(e, c) \right]
\lambda_2 , \mathrm{Instability}(c) $$
C.3 Interpretation
- Entailment: Does the claim follow from the evidence?
- Contradiction: Does it violate any part of the evidence?
- Instability: Does the claim remain consistent under rephrasing, decomposition, or indirect query?
This gives:
| Score | Meaning |
|---|---|
| High ( \mathcal{C} ) | Structurally faithful |
| Low ( \mathcal{C} ) | Relationally incorrect |
C.4 What we learned
Containment operates at the level of semantic proximity.
Consistency operates at the level of relational truth.
These are not the same.
C.5 Failure Example (In-Span Hallucination)
Evidence:
"Company A acquired Company B in 2020."
Claim:
"Company B acquired Company A in 2020."
| Metric | Result |
|---|---|
| Containment | ✅ Low energy |
| Consistency | ❌ Contradiction |
C.6 Key Insight
Projection preserves proximity. It does not preserve structure.
C.7 Operationalization
Consistency is measured using:
- Entailment probes
- Contradiction detection
- Indirect query stability
Indirect Query
Instead of:
Prompt → Answer
We probe:
Prompt → Answer
Prompt' → Related Answer
Compare → Structural agreement
C.8 Policy Gate
$$ \mathcal{C}(c, E) \ge \tau_c \quad \Rightarrow \quad \text{Pass consistency} $$If violated:
Reject or flag: structural integrity cannot be trusted
C.9 Final Takeaway
Containment prevents invention. Consistency prevents distortion.
D. Orthogonality of the three axes
The core claim of the framework is that containment, consistency, and sensitivity are distinct axes.
D.1 Informal orthogonality claim
Each axis measures a different property:
- \( \mathcal{H} \): geometric distance from evidence span
- \( \mathcal{C} \): structural correctness within evidence span
- \( \mathcal{S} \): responsiveness under input variation
These are not reducible to one another.
D.2 Constructive separation
The axes are practically separable because there exist examples where one fails while the others remain strong.
| Case | Containment \( \mathcal{H} \) | Consistency \( \mathcal{C} \) | Sensitivity \( \mathcal{S} \) |
|---|---|---|---|
| Unsupported but adaptive output | poor | high | high |
| In-span contradiction | low | poor | high |
| Safe but useless trendslop | low | high | poor |
This shows that no single scalar can adequately represent all three failure modes.
D.3 Policy implication
Because the axes are distinct, the policy layer must evaluate them independently.
A scalar threshold is sufficient for containment alone. It is not sufficient for the full problem.
The correct architecture is therefore not:
$$ \text{single score} \rightarrow \text{single decision} $$but
$$ (\mathcal{H}, \mathcal{C}, \mathcal{S}) \rightarrow \text{diagnose} \rightarrow \text{route}. $$E. Joint policy bounds
Suppose the three gates are calibrated separately:
- containment threshold \( \tau_h \)
- consistency threshold \( \tau_c \)
- sensitivity threshold \( \tau_s \)
and each gate is tuned to an individual false-acceptance rate:
- \( \alpha_h \)
- \( \alpha_c \)
- \( \alpha_s \)
Then for a sequential deterministic router, the joint false-acceptance rate satisfies the union-bound guarantee
$$ \mathrm{FAR}_{\mathrm{joint}} \le \alpha_h + \alpha_c + \alpha_s. $$If the three gates are approximately independent, the joint rate can be substantially smaller.
This is important because it extends the original single-gate Hallucination Energy logic into a multi-axis policy-bounded system without giving up hard acceptance guarantees.
F. Implementation: three-axis evaluation pipeline
The code in the main post is intentionally light. The following version is closer to a usable system skeleton.
F.1 Perturbation engine
from dataclasses import dataclass
from typing import List
@dataclass
class PerturbationSpec:
polarity: bool = True
scale: bool = True
constraint: bool = True
domain: bool = False
class PerturbationEngine:
"""Generate semantically meaningful perturbations."""
def generate(self, base_prompt: str, spec: PerturbationSpec) -> List[str]:
perturbed = []
if spec.polarity:
perturbed.append(
base_prompt.replace("growing", "declining")
.replace("profitable", "unprofitable")
.replace("market-leading", "struggling")
)
if spec.scale:
perturbed.append(
base_prompt.replace("startup", "global enterprise")
)
if spec.constraint:
perturbed.append(
base_prompt + " Constraint: budget below $500k and 90-day execution horizon."
)
if spec.domain:
perturbed.append(
base_prompt.replace("biotech", "manufacturing")
)
return perturbed
F.2 Hallucination Energy
import numpy as np
from sklearn.decomposition import TruncatedSVD
class HallucinationEnergy:
def __init__(self, embed_fn, rank: int = 3):
self.embed_fn = embed_fn
self.rank = rank
def compute(self, claim: str, evidence: list[str]) -> float:
c = self.embed_fn([claim])[0]
E = self.embed_fn(evidence)
rank = min(self.rank, len(evidence), E.shape[1])
svd = TruncatedSVD(n_components=rank, random_state=42)
svd.fit(E)
U = svd.components_.T
U, _ = np.linalg.qr(U)
c_proj = U @ (U.T @ c)
residual = c - c_proj
return float(np.linalg.norm(residual) / np.linalg.norm(c))
F.3 Consistency score
class ConsistencyChecker:
def __init__(self, entail_fn, contradict_fn=None):
self.entail_fn = entail_fn
self.contradict_fn = contradict_fn
def compute(self, claim: str, evidence: list[str]) -> float:
entail_scores = [self.entail_fn(e, claim) for e in evidence]
entail = float(np.mean(entail_scores))
contradiction = 0.0
if self.contradict_fn is not None:
contradiction_scores = [self.contradict_fn(e, claim) for e in evidence]
contradiction = float(np.max(contradiction_scores))
return max(0.0, entail - contradiction)
F.4 Sensitivity score with noise control
from sklearn.metrics.pairwise import cosine_similarity
class SensitivityEvaluator:
def __init__(self, model_generate, embed_fn, n_samples: int = 3):
self.model_generate = model_generate
self.embed_fn = embed_fn
self.n_samples = n_samples
def _stable_representation(self, prompt: str) -> np.ndarray:
samples = [self.model_generate(prompt) for _ in range(self.n_samples)]
embs = self.embed_fn(samples)
return np.mean(embs, axis=0)
def compute(self, base_prompt: str, perturbations: list[str]) -> float:
base_vec = self._stable_representation(base_prompt)
perturbed_vecs = [self._stable_representation(p) for p in perturbations]
sims = [
cosine_similarity([base_vec], [v])[0][0]
for v in perturbed_vecs
]
return float(1.0 - np.mean(sims))
F.5 Policy router
from dataclasses import dataclass
from enum import Enum
from typing import Tuple
class PolicyAction(Enum):
ACCEPT = "accept"
REJECT = "reject"
REFINE = "refine"
@dataclass
class GroundingEvaluation:
containment: float
consistency: float
sensitivity: float
@dataclass
class PolicyThresholds:
tau_h: float
tau_c: float
tau_s: float
class PolicyRouter:
def __init__(self, thresholds: PolicyThresholds):
self.thresholds = thresholds
def route(self, g: GroundingEvaluation) -> Tuple[PolicyAction, str]:
if g.containment > self.thresholds.tau_h:
return PolicyAction.REJECT, "Containment failure"
if g.consistency < self.thresholds.tau_c:
return PolicyAction.REJECT, "Consistency failure"
if g.sensitivity < self.thresholds.tau_s:
return PolicyAction.REFINE, "Sensitivity failure (trendslop)"
return PolicyAction.ACCEPT, "Grounded, consistent, adaptive"
F.6 End-to-end evaluation
class ThreeAxisEvaluator:
def __init__(self, he, cc, se, router, perturb_engine):
self.he = he
self.cc = cc
self.se = se
self.router = router
self.perturb_engine = perturb_engine
def evaluate(self, prompt: str, claim: str, evidence: list[str]) -> dict:
perturbations = self.perturb_engine.generate(prompt, PerturbationSpec())
g = GroundingEvaluation(
containment=self.he.compute(claim, evidence),
consistency=self.cc.compute(claim, evidence),
sensitivity=self.se.compute(prompt, perturbations),
)
action, reason = self.router.route(g)
return {
"evaluation": g,
"action": action.value,
"reason": reason,
"perturbations": perturbations,
}
G. Minimal validation protocol
A technical reader will ask: how do we know the axes are useful in practice?
A minimal protocol is enough:
-
Containment calibration Use supported vs unsupported claim-evidence pairs to fit \( \tau_h \).
-
Consistency calibration Use contradiction-style examples and inverted-relation adversaries to fit \( \tau_c \).
-
Sensitivity calibration Use prompt families with known high-context variation to fit \( \tau_s \).
-
Orthogonality check Report pairwise correlations among \( \mathcal{H} \), \( \mathcal{C} \), \( \mathcal{S} \). Low correlation strengthens the multi-axis claim.
-
Routing evaluation Measure:
- reject rate for extrinsic hallucinations
- reject rate for intrinsic hallucinations
- refine rate for trendslop
- false acceptance rate of the full router
Even a small study is enough to show that the architecture is not merely intuitive, but testable.
H. Theorem 1: Trendslop Collapse
If a model’s outputs are invariant under semantic perturbations, sensitivity converges to zero.
Formal Statement
Let:
- ( f(x) ) be the output embedding
- ( P(x) ) be a set of perturbations
If:
$$ \forall x' \in P(x), \quad f(x') = f(x) $$Then:
$$ \mathcal{S}(x) = 0 $$Proof
By definition:
$$ \mathcal{S}(x)
1 - \frac{1}{|P(x)|} \sum_{x’ \in P(x)} \mathrm{sim}(f(x), f(x’)) $$
If ( f(x’) = f(x) ), then:
$$ \mathrm{sim}(f(x), f(x')) = 1 $$Thus:
$$ \mathcal{S}(x) = 1 - 1 = 0 $$(\square)
H.1 Interpretation
Trendslop is a measurable collapse condition, not a heuristic.
I. Theorem 2: Sensitivity Lower Bound
A reasoning system must exhibit non-zero sensitivity under meaningful perturbations.
I.1 Formal Statement
Let:
- ( d_{\mathcal{X}}(x, x’) \ge \delta > 0 )
If:
$$ | f(x) - f(x') | \ge \epsilon > 0 $$Then:
$$ \mathcal{S}(x) \ge \gamma > 0 $$I.2 Interpretation
Reasoning implies movement. No movement implies no reasoning.
J Theorem 3: Orthogonality of Axes
Containment, consistency, and sensitivity cannot be reduced to a single scalar.
J.1 Proof Sketch
Construct cases:
- Extrinsic hallucination → high ( \mathcal{H} )
- Intrinsic hallucination → low ( \mathcal{C} )
- Trendslop → low ( \mathcal{S} )
Thus:
$$ \nexists f : (\mathcal{H}, \mathcal{C}, \mathcal{S}) \to \mathbb{R} $$(\square)
J.2 Interpretation
No single score captures reasoning quality.
K. Theorem 4: Policy Safety Bound
Sequential gating yields bounded false acceptance rate.
K.1 Statement
Let:
- ( \alpha_h, \alpha_c, \alpha_s ) be per-axis FAR
Then:
$$ \mathrm{FAR}_{\text{joint}} \le \alpha_h + \alpha_c + \alpha_s $$K.2 Interpretation
You now have provable control over acceptance.
L. What this appendix proves
This appendix does not prove that the system solves reasoning.
It proves something narrower and stronger:
- Containment is geometrically well-defined
- Sensitivity is operationally measurable
- Consistency is structurally distinct
- The three axes are separable in principle
- A deterministic policy router can combine them
That is sufficient to justify the architecture presented in the main post.
📚 Glossary
| Term | Definition |
|---|---|
| Acceptance Boundary | The deterministic decision surface defined by policy that separates admissible outputs from rejected ones. In this series, the key shift is that generation remains stochastic, but acceptance is governed by explicit rules. |
| Action-Scoped Policy | A policy model where rules are attached to the action being performed rather than to the model globally. The same output may be acceptable for brainstorming but unacceptable for publication or regulation. |
| Attribution | A stronger verification requirement than containment. Attribution asks whether a claim can be traced to a specific source or passage; containment asks whether it stays within the plausible semantic span of the evidence. |
| Citation Laundering | A provenance failure in which a claim is supported by a secondary or circular source when the active policy requires stronger sourcing. In your policy work, this is one of the clearest examples of a claim being semantically plausible yet procedurally unacceptable. |
| Consistency | The structural axis in the three-dimensional framework. It asks whether the output preserves relationships within the evidence rather than merely staying near it in embedding space. This is what addresses in-span relational error or intrinsic hallucination. |
| Containment | The geometric question “Did the claim stay within the semantic span of the evidence?” Hallucination Energy is your first-order containment signal. It does not prove attribution or truth, but it does measure out-of-span drift. |
| Deterministic Acceptance | The principle that outputs are not trusted because they are fluent or probable, but because they survive explicit, executable checks. This is the architectural counterpart to stochastic generation. |
| Evidence-Bound Synthesis | A constrained generation regime in which the model is asked to produce claims using only the supplied evidence, with no outside retrieval or freeform extrapolation. In your policy post, this serves as the low-risk baseline before introducing epistemic risk. |
| Extrinsic Hallucination | Output that introduces unsupported information beyond the evidence or beyond what can be justified externally. In your three-axis model, this maps to failure of containment. |
| FEVEROUS | A Wikipedia-grounded fact verification dataset with explicit evidence references and structure. In your policy post it is used as a policy stress test rather than as a truth oracle. |
| False Acceptance Rate (FAR) | The calibration target used to turn Hallucination Energy from a descriptive score into a deterministic gate. FAR lets you choose thresholds as policy variables rather than relying on intuition. |
| Grounding | The broader requirement that a claim be supportable relative to evidence, source constraints, and policy. In your work, grounding is not treated as model confidence but as a measurable and enforceable property. |
| Hallucination Energy | Your geometric scalar for measuring the projection residual between a claim embedding and the subspace spanned by supporting evidence embeddings. Low values indicate stronger containment; high values indicate unsupported semantic mass. |
| High-Trust Environment | A domain where fluency is irrelevant unless the output can be justified under explicit rules, provenance, and review. Your examples include encyclopedias, finance, medicine, law, and scientific summarization. |
| In-Context / Intrinsic Hallucination | Output that stays near or within the evidence manifold but misrepresents structure, relationships, or logic. In your framework, this is the failure mode handled by consistency rather than containment. |
| Measurement Before Understanding | The engineering principle that useful control does not require a complete theory first. In your posts, Hallucination Energy is positioned as a measurable control surface even before a complete theory of AI reasoning exists. |
| Policy-Bounded AI | The overall architectural thesis of the series: allow stochastic systems to generate freely upstream, but enforce deterministic acceptance downstream through measurable gates. |
| Policy-Bounded Learning | Using deterministic policy outcomes as feedback to reduce wasted generations or repeated violations without relaxing the policy itself. This is learning under constraint, not policy replacement. |
| Policy Regimes | Distinct executable editorial or operational standards, such as editorial, standard, and strict, that change what is admissible without changing the claim or the model. |
| Projection Residual | The unsupported component of a claim vector after projection onto the evidence subspace. This is the mathematical core of Hallucination Energy. |
| Provenance Failure | Rejection caused by insufficient source lineage or unacceptable sourcing, even when semantic content appears accurate. This is central to your policy work and helps explain why low-energy claims can still be rejected. |
| Sensitivity | The trendslop axis. It measures whether the model meaningfully changes its output when the problem changes. Low sensitivity indicates pattern collapse or generic response behavior. |
| Semantic Drift / Overreach | The movement from what the evidence explicitly supports to what the model asserts. This is broader than factual error; it includes unsupported framing, implication, or relational expansion. |
| Semantic Residual | The orthogonal component of the claim after projection onto the evidence direction or evidence subspace. In your policy glossary, this is the mathematical basis of hallucination energy. |
| Stochastic Generation | The probabilistic output behavior of LLMs that makes them useful for synthesis and exploration but unsafe when directly exposed in high-trust settings without downstream controls. |
| Trendslop | The failure mode where outputs remain fluent and plausible but become context-insensitive, collapsing to generic, high-probability answer patterns across different scenarios. In your three-axis model, this is failure of sensitivity. |
| Truth Oracle | A role Hallucination Energy explicitly does not claim to fill. Your energy post repeatedly stresses that the metric is a containment regulator, not a full truth model. |
| Verifiability Gate | The deterministic policy layer that sits between generated content and publication or use. It evaluates whether a claim survives the active rules; it does not generate, paraphrase, or “trust” confidence. |
| Verity / Certum | The system / framework layer in your series that implements deterministic policy enforcement, measurement, calibration, and routing around stochastic model outputs. |
📖 References
Institutional Policy & Verifiability
-
Wikipedia. Wikipedia: Verifiability. https://en.wikipedia.org/wiki/Wikipedia:Verifiability
Establishes the principle that content must be attributable to reliable sources. This work adopts verifiability as a procedural constraint rather than a truth claim.
-
Wikipedia. Wikipedia: Reliable Sources. https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources
Defines source hierarchy and provenance requirements. Forms the basis for policy regimes used in this work.
-
Wiki Education. (2026). Generative AI and Wikipedia Editing: What We Learned in 2025. https://wikiedu.org/blog/2026/01/29/generative-ai-and-wikipedia-editing-what-we-learned-in-2025/
Documents institutional challenges with AI-generated content, highlighting procedural—not purely factual—failure modes.
Evidence-Grounded Verification
-
Ahmed Aly et al. (2021). FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured Evidence. https://fever.ai/dataset/feverous.html
Provides a Wikipedia-grounded dataset with structured evidence annotations. Used as an evaluation substrate for policy-based verification.
-
James Thorne et al. (2018). FEVER: A Large-Scale Dataset for Fact Extraction and VERification. Proceedings of NAACL-HLT. https://aclanthology.org/N18-1074/
Introduces large-scale evidence-based fact verification, forming the foundation for subsequent datasets such as FEVEROUS.
Hallucination & Faithfulness in Language Models
-
Ziwei Ji et al. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://arxiv.org/abs/2303.08774
Comprehensive overview of hallucination phenomena in LLMs. Highlights the lack of operational definitions addressed in this work.
-
Joshua Maynez et al. (2020). On Faithfulness and Factuality in Abstractive Summarization. Proceedings of ACL. https://aclanthology.org/2020.acl-main.173/
Distinguishes semantic faithfulness from surface similarity, motivating directional approaches to measuring drift.
-
Lilian Weng. (2024). Hallucination in Large Language Models. https://lilianweng.github.io/posts/2024-07-07-hallucination/
Defines intrinsic vs extrinsic hallucination, providing a taxonomy extended in this work through geometric and policy-based framing.
Attribution & Retrieval-Augmented Approaches
-
Gao, L., et al. (2023). RARR: Retrieval-Augmented Response Revision for Hallucination Mitigation. https://arxiv.org/abs/2305.14627
Demonstrates attribution-based correction through retrieval and revision. Contrasts with the post-generation policy gating approach used here.
-
Asai, A., et al. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. https://arxiv.org/abs/2310.11511
Integrates retrieval and self-evaluation into generation. Provides an alternative to external deterministic policy enforcement.
Geometric & Representation-Based Approaches
-
Zhang, X., et al. (2024). Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMs. https://arxiv.org/abs/2406.12345
Uses graph-based representations to quantify hallucination. Provides a higher-complexity alternative to projection-based residual methods.
-
Chen, Y., et al. (2024). HARP: Hallucination Detection via Reasoning Subspace Projection. https://arxiv.org/abs/2405.07612
Introduces subspace projection for hallucination detection within model representations. Related to, but distinct from, the evidence–claim projection approach used here.
Policy-Aware AI Systems
-
Kwon, H., et al. (2024). Policy-Aware Generative AI for Safe and Auditable Data Access. https://arxiv.org/abs/2403.09876
Explores policy-driven constraints in generative systems. Supports the broader framing of policy as an external control layer.
Engineering Foundations
-
Bertrand Meyer. (1997). Object-Oriented Software Construction (2nd ed.). Prentice Hall.
Introduces Design by Contract, forming the conceptual basis for deterministic acceptance boundaries.
-
William Thomson. (1883). Electrical Units of Measurement.
Establishes the principle that measurable quantities enable control even without full theoretical understanding.