Beyond Hallucination Energy: A Three-Dimensional Framework for Reliable AI Outputs

Beyond Hallucination Energy: A Three-Dimensional Framework for Reliable AI Outputs
Page content

🧩 1. TLDR

AI doesn’t just hallucinate. Sometimes it gives answers that are fluent, safe… and completely useless.

Most discussions about AI failure focus on hallucination:

  • making things up
  • getting facts wrong
  • fabricating sources

That’s real. It matters.

But it’s not the most dangerous failure mode in production systems.

There is a quieter one.

A more subtle one.

And in practice a more pervasive one.

AI systems often fail not by being wrong, but by failing to think at all.

This post introduces that failure mode:

Trendslop: when AI produces generic, trend-following, low-sensitivity answers that ignore the specifics of the problem.

We show:

  • why this happens (training incentives + decoding stochasticity)
  • how it differs from hallucination (error vs. absence)
  • how to measure it (perturbation-based sensitivity testing)
  • and how it fits alongside Hallucination Energy (containment) and consistency checks (structure)

The core idea is simple:

If an answer doesn’t change when the problem changes, the system isn’t reasoning it’s pattern-matching.

And pattern-matching, no matter how fluent, cannot be trusted with decisions.


🧠 2. The Missing Failure Mode

The conversation around AI reliability has matured quickly.

We now talk about:

  • hallucinations
  • bias
  • alignment
  • safety
  • interpretability

These are all important.

But they share a common assumption:

That the model is trying to reason and sometimes fails.

What if that assumption is wrong?


⚠️ A Different Kind of Failure

There is another class of output that looks perfectly fine:

  • grammatically correct
  • confident
  • well-structured
  • aligned with current thinking

And yet:

  • it does not engage with the specifics of the prompt
  • it does not adapt to changing conditions
  • it does not produce meaningful differentiation

It sounds intelligent.

But it is not doing any real work.


🔍 The Key Distinction

Failure Type What’s Happening Detectability
Hallucination Model generates incorrect content ✅ Checkable against evidence
Trendslop Model generates generic content ❌ No obvious error to flag

This is the core problem:

Hallucination is incorrect reasoning. Trendslop is absence of reasoning.

And absence is harder to detect than error.


🧠 Why This Matters for Policy-Bounded Systems

This observation is not just descriptive.

It exposes a boundary in how we currently evaluate model outputs.

To understand that boundary, we connect it to the geometric grounding framework introduced earlier.

In our previous work on Hallucination Energy, we established:

Generation may be stochastic; acceptance must be deterministic.

But deterministic gating requires a measurable signal.

Hallucination Energy measures containment: does the claim lie within the evidence span?

Trendslop reveals a second requirement:

Sensitivity: does the output respond to changes in the input?

Without both, a policy gate can accept outputs that are:

  • ✅ grounded (low energy)
  • ✅ consistent (no contradiction)
  • ❌ useless (context-insensitive)

This post closes that gap.

This is not a theoretical concern.

It is a measurable, repeatable behavior that appears consistently under controlled testing.

To see this clearly, we turn to empirical results.


📉 3 — What the Research Shows

Recent empirical work provides direct evidence of this failure mode.

In controlled studies where large language models were asked to produce strategic recommendations across varied scenarios, a consistent pattern emerged:

Outputs remained fluent and plausible, but were largely insensitive to the specific context.

This behavior has been termed trendslop.


🧪 The Key Observation

Across varied prompts:

  • different industries
  • different growth conditions
  • different constraints

The responses:

  • reused the same structural templates
  • emphasized identical high-level themes
  • produced nearly interchangeable recommendations

This was not hallucination.

The outputs contained no obvious factual errors. Instead, they converged to high-probability, trend-aligned patterns independent of input.


🔗 Why This Connects to Geometric Grounding

In our previous work on Hallucination Energy, we established:

Projection residual measures containment, not relational contradiction or contextual responsiveness.

Trendslop exploits this exact boundary. The output:

  • ✅ stays within the semantic span of the training distribution (low energy)
  • ✅ avoids factual contradiction (structurally consistent)
  • ❌ fails to adapt to the local context (insensitive)

It passes the containment gate. It passes the consistency check. But it fails the reasoning test.


⚠️ Why This Matters More Than It Seems

At first glance, this behavior may appear benign. The outputs are fluent, they reflect accepted best practices, and they avoid obvious errors.

But this “reasonableness” is exactly what makes the failure mode so dangerous. Because the advice sounds like a safe best practice, it bypasses our critical filters.

It creates the illusion of intelligence without the substance of reasoning.

In real-world systems, this leads to:

  • Poor decisions disguised as safe ones: Recommendations that are “best practice” in general but catastrophic for the specific constraints of the problem.
  • Loss of signal in critical contexts: The model effectively ignores the most important variables in the prompt to find a high-probability pattern.
  • False sense of security: Over-reliance on outputs that look correct but were never actually evaluated against the specific input.

This establishes a critical limitation in our current AI stack: Evaluation methods that operate on single outputs cannot detect whether reasoning has occurred.

They can only detect whether the output appears valid. This distinction drives the need for a new evaluation dimension.


🔍 4. Why This Is Worse Than Hallucination

It’s tempting to treat trendslop as a “milder” issue. After all:

  • nothing is obviously false
  • no facts are fabricated
  • the output aligns with accepted best practices

But in production systems, trendslop is often more dangerous.


⚠️ Visibility vs Invisibility

Hallucinations tend to be detectable:

  • a wrong number
  • a fabricated citation
  • a claim that contradicts known evidence

These trigger clear policy responses:

  • ✅ High Hallucination Energy → Reject
  • ✅ Contradiction detected → Flag
  • ✅ Evidence mismatch → Request revision

Trendslop bypasses all of them.


🔍 The Detection Gap

A trendslop response:

  • passes factual checks ✔️
  • passes style checks ✔️
  • passes geometric containment gates ✔️
  • passes alignment filters ✔️

And yet:

It fails to engage with the actual problem.

There is no obvious “error” to point to. Only a lack of adaptation.


🧠 Decision-Level Impact

This difference becomes critical when AI informs real decisions.

Failure Type Immediate Effect Detection Path Long-Term Effect
Hallucination Incorrect output Energy gate / fact-check Correctable failure
Trendslop Generic output No standard detector Systematic misdirection

Example

A hallucinated financial figure may trigger a high energy score and get rejected.

A trendslop strategy recommendation:

“Focus on innovation, improve operational efficiency, and align with customer needs.”

…will not be flagged. It will be accepted, deployed, and acted upon.

But it may:

  • waste resources on generic initiatives
  • mask real constraints specific to the business
  • prevent meaningful, context-aware action

🔥 The Core Risk

Hallucination breaks correctness. Trendslop breaks usefulness.

And in high-trust domains:

An answer that is useless but trusted is worse than an answer that is clearly wrong.


🔍 5. Expanding the Failure Taxonomy

This observation is not just descriptive. It exposes a boundary in how we currently evaluate model outputs.

To understand that boundary, we return to the geometric grounding framework introduced earlier.


📐 The Limit of Geometric Containment

In our previous work on Hallucination Energy, we established a clean boundary:

Projection residual measures containment.
It detects when a claim extends beyond the semantic span of its evidence.

This gives us a powerful first guarantee:

  • Unsupported information → detectable
  • Out-of-span drift → rejectable

But adversarial stress-testing revealed a structural limit:

Hard-mined negatives often remain within the same embedding subspace as supported claims.

When subspace overlap is high:

Projection cannot distinguish relational inversion from grounded truth.

In other words:

A claim can be geometrically valid and still be wrong.


🧠 From Geometry to Failure Modes

This forces an expansion of the taxonomy.

We anchor this expansion in the broader research context established by :contentReference[oaicite:0]{index=0}, whose survey distinguishes two primary categories:

  • Extrinsic Hallucination: introduction of unsupported information
  • In-Context Hallucination: distortion or contradiction of provided evidence

These categories map directly onto the geometric behavior we observe.

But they are not sufficient.

There exists a third failure mode:

The model does not fail by inventing or contradicting.
It fails by not engaging at all.


🧭 The Three-Axis Failure Model

We therefore refine Weng’s taxonomy into three measurable, policy-actionable axes:

Failure Mode What Happens Geometric Signature Detection Maps to Weng
Extrinsic Hallucination Introduces unsupported information High projection residual (out-of-span) Hallucination Energy Extrinsic
Intrinsic Hallucination Distorts relationships within evidence Low residual (in-span) Structural checks (Appendix) In-context
Trendslop Ignores problem-specific constraints Output invariance under perturbation Sensitivity testing

Containment vs Attribution
Containment asks whether a claim lies within the semantic span of the evidence.
Attribution asks whether it can be traced to a specific source.

Hallucination Energy measures containment—not attribution.
This is intentional: containment serves as a fast, model-agnostic policy gate, while attribution remains a downstream verification step.


🔬 What Actually Changed

This is not just a refinement of terminology.

It is a change in how we interpret correctness.

Previously:

“If it’s grounded, it’s acceptable.”

Now:

Grounding is necessary, but not sufficient.


⚠️ The Missing Dimension

The gap becomes clear:

  • Containment ensures the model does not invent
  • Structural checks ensure it does not distort

But neither guarantees that the output:

responds to the actual problem

This is where trendslop emerges.


🔥 Trendslop as a Failure Mode

Trendslop is not hallucination in the traditional sense.

It is not:

  • unsupported
  • contradictory

It is:

context-insensitive convergence to high-probability answer patterns

Its defining property is:

invariance under meaningful perturbation


🧠 Why This Matters

This reveals a deeper truth:

Correctness and usefulness are not the same.

A model can:

  • remain within evidence
  • preserve internal structure
  • and still fail to reason

Because it never adapted to the problem.


🔗 From Measurement to Policy

This expansion transforms the evaluation question.

From:

“Is this answer grounded?”

To:

“What type of failure does this output exhibit—and how should the system respond?”

This shift is critical.

It allows us to move from:

  • scalar scoring
  • to multi-axis diagnosis

🧭 Summary of the Expanded Taxonomy

Category Policy Question Axis
Extrinsic Hallucination “Did the model go beyond the evidence?” Containment
In-Context Hallucination “Did it distort relationships?” Structural fidelity
Trendslop “Did it respond to the problem?” Sensitivity

⚠️ A Known Frontier: The “Unknown” Problem

One boundary remains.

As highlighted by :contentReference[oaicite:1]{index=1}, a trustworthy system must also:

know when it does not know

Benchmarks such as TruthfulQA and SelfAware explore this dimension.

Our current framework does not yet include an explicit epistemic calibration axis.

An output may pass all three checks and still be:

  • confidently incorrect
  • fundamentally unknowable

This represents a fourth dimension:

the ability to abstain


🔬 Forward Signal

This is not a flaw in the framework.

It is a boundary condition.

Containment constrains what can be said.
Structure constrains how it is said.
Sensitivity constrains why it is said.

The remaining question is:

Should it be said at all?


🧠 6. A Concrete Example

To see how these three axes interact in practice, consider a simple perturbation test.


🧪 The Setup

Prompt A (Decline)

A startup is losing money in a declining market with strong competition. What strategic actions should it take?

Prompt B (Growth)

A market-leading company has strong margins in a rapidly growing sector. What strategic actions should it take?


🤖 Typical LLM Output

Both prompts frequently return variations of:

  • “Focus on innovation and product differentiation”
  • “Improve operational efficiency”
  • “Invest in customer experience and retention”
  • “Leverage data-driven decision making”

📊 How Each Axis Evaluates This

Metric Result Why
Hallucination Energy ✅ Low Claims are generic, well-grounded in business literature. No out-of-span fabrication.
Consistency ✅ High No internal contradictions. Logically sound advice.
Sensitivity (Trendslop) 🚨 High Output structure and recommendations are nearly identical despite opposite market conditions.

⚠️ What This Reveals

The system passes the containment gate. It passes the consistency check. It is accepted.

But it fails the reasoning test.

A context-aware system would produce:

Prompt Expected Reasoning Pattern
Startup in decline Survival mode: cost rationalization, runway extension, narrow focus, potential pivot
Market leader Expansion mode: market capture, strategic M&A, R&D acceleration, barrier creation

Instead, the model collapses both into:

The same high-probability answer template

Trendslop in action. The output is not wrong. It is invariant.

And invariance under meaningful perturbation is the signature of pattern-matching, not reasoning.


📐 7. Formal Definition of Trendslop

We can now move from intuition to something precise.

The examples we’ve seen all share one property:

The output does not change meaningfully when the input changes.

This gives us a clean way to define trendslop.


📌 Core Definition

Trendslop is context-insensitive reasoning, where outputs collapse to high-probability patterns under input variation.


🔬 Formal View

Let \(x\) represent the input context and \(f(x)\) represent the model’s output distribution (or its aggregated response).

Trendslop occurs when:

$$ \frac{\Delta f(x)}{\Delta x} \approx 0 $$

In continuous terms, this is \(\frac{\partial f(x)}{\partial x} \approx 0\). But since LLMs are discrete stochastic generators, we interpret this as perturbation insensitivity in output space.


🧠 Interpretation

This does not mean the output is literally identical.

It means:

  • structural templates are reused
  • reasoning patterns are invariant
  • conclusions are functionally interchangeable

🔥 Key Insight

A reasoning system should be sensitive to its inputs.

If it isn’t, it stops adapting.

It stops evaluating.

And what remains isn’t reasoning it’s pattern convergence.

The system collapses toward a default answer basin: the region of highest training-data probability that satisfies surface expectations while ignoring the specifics of the problem.


⚠️ Important Distinction

Trendslop is not random error. It is stable, repeatable convergence to high-probability patterns.

Under stochastic decoding, the model still gravitates toward the same answer basin — not because of noise, but because that region dominates the probability landscape.

That stability is what makes trendslop so difficult to detect with single-pass evaluation.


📏 8. Measuring It

Once defined, trendslop becomes measurable.

The key idea is simple:

If outputs remain similar across meaningfully different inputs, sensitivity is low.


🎯 Basic Metric

We define a trendslop score based on output divergence under perturbation:

$$ \mathcal{S}(x) = 1 - \frac{1}{|P|} \sum_{x' \in P} \text{sim}\big(f(x), f(x')\big) $$

Where:

  • \(P\) = set of perturbed inputs
  • \(\text{sim}(\cdot)\) = semantic similarity (embedding cosine or structural overlap)
  • High \(\mathcal{S}\) → adaptive reasoning
  • Low \(\mathcal{S}\) → trendslop

🧪 Experimental Setup

  1. Take a base prompt \(x\)
  2. Generate \(n\) meaningfully perturbed versions \(\{x'_1, \dots, x'_n\}\)
  3. Run the model on each
  4. Compute pairwise output similarity
  5. Aggregate into a single sensitivity score

🧠 What We Are Measuring

Not correctness. Not truth. Not containment.

We are measuring:

Responsiveness of reasoning to context

This is orthogonal to Hallucination Energy.

  • Energy measures distance from evidence.
  • Sensitivity measures response to variation.

Together, they bound what the system says and how it adapts.


🔬 Implementation Sketch (Certum-aligned)

def compute_trendslop_score(base_prompt, perturbations, model, n_samples=3):
    """Measure context sensitivity via perturbation testing."""
    outputs = []
    
    for p in perturbations:
        # Sample to account for decoding stochasticity
        samples = [model.generate(p, temperature=0.7) for _ in range(n_samples)]
        outputs.append(aggregate_to_centroid(samples))  # e.g., mean embedding or structural template
    
    # Compute pairwise semantic divergence
    similarities = pairwise_cosine(outputs)
    mean_sim = np.mean(similarities)
    
    # Invert: high similarity = high trendslop (bad)
    trendslop_score = 1.0 - mean_sim
    sensitivity_score = mean_sim  # if you prefer positive = good
    
    return sensitivity_score

🔥 Why This Works

Because real reasoning has a property: small changes in input → meaningful changes in output Trendslop violates that property. And now, we have a scalar that captures it.

We now have: a definition a metric a measurement method The next step is to specify how to perturb. Not all input changes are equal. To stress-test sensitivity, we need perturbations that force genuine reasoning shifts. That’s what Section 9 covers.


🔬 9. Perturbation as a Tool

To measure trendslop effectively, we need to vary the input in ways that force reasoning shifts.

Not all perturbations are equal. Random word swaps or surface rephrasing won’t stress-test sensitivity. We need semantic perturbations that change the problem’s structure while preserving its surface form.


🎯 Principle

If the problem changes, the solution should change. If it doesn’t, the system is pattern-matching, not reasoning.


🧪 Perturbation Taxonomy (Certum-Aligned)

We define four perturbation classes, each targeting a different reasoning dimension:

Perturbation Type What Changes What Should Change in Output
Context Polarity Success ↔ Failure, Growth ↔ Decline Strategic priorities, risk tolerance, resource allocation
Scale Shift Startup ↔ Enterprise, Local ↔ Global Operational scope, governance complexity, investment horizon
Constraint Injection Add budget limits, regulatory pressure, time urgency Trade-off weighting, feasibility filtering, sequencing
Domain Transfer Healthcare → Finance, Software → Manufacturing Domain-specific constraints, stakeholder maps, success metrics

🧠 Implementation Sketch (Certum)

def generate_perturbations(base_prompt, modes=['polarity', 'scale', 'constraint', 'domain']):
    """Generate semantically meaningful perturbations for sensitivity testing."""
    perturbations = []
    
    if 'polarity' in modes:
        perturbations.append(invert_outcome_context(base_prompt))  # growth→decline
    if 'scale' in modes:
        perturbations.append(rescale_entity_scope(base_prompt))     # startup→enterprise
    if 'constraint' in modes:
        perturbations.append(inject_hard_constraint(base_prompt))   # add budget cap
    if 'domain' in modes:
        perturbations.append(transfer_domain_context(base_prompt))  # healthcare→finance
    
    return perturbations

def evaluate_sensitivity(base_prompt, model, perturbation_modes=None):
    """End-to-end trendslop measurement."""
    perturbations = generate_perturbations(base_prompt, perturbation_modes)
    return compute_trendslop_score(base_prompt, perturbations, model)

⚠️ What Trendslop Does Under Perturbation

It ignores these changes.

  • same structural template
  • same high-level advice
  • same tone and framing

This is the signature:

Invariant output under high-impact input variation


🔍 Measuring the Effect

For each perturbation class:

outputs = [model.generate(p) for p in perturbations]
divergence = semantic_divergence_matrix(outputs)
sensitivity_score = 1 - np.mean(divergence)  # low divergence = high trendslop

🔥 Key Signal

Low variation across high-impact perturbations = strong trendslop

And critically:

This failure mode is invisible to containment-based gates.

A trendslop output can have:

  • ✅ Low Hallucination Energy (grounded in general knowledge)
  • ✅ High Consistency (no internal contradiction)
  • ❌ Low Sensitivity (context-insensitive)

This is why we need the third axis.


We now have:

  • a perturbation taxonomy
  • a measurement pipeline
  • a clear failure signature

The next step is to map these three signals—Containment, Consistency, Sensitivity—into a unified decision architecture.

That’s Section 10.


⚡ 10. Trendslop vs Hallucination Energy

At this point, we can distinguish three fundamentally different failure signals.

Not heuristics. Not overlapping scores.

Three orthogonal axes that measure different properties of reasoning.


🧠 Three Orthogonal Signals

Metric What It Measures Geometric Interpretation Policy Action
Hallucination Energy Distance from evidence span Projection residual: \( \|c - \mathbf{U}_r \mathbf{U}_r^T c\|_2 \) Reject if unsupported
Consistency Score Structural correctness (see Appendix) Relational fidelity within evidence Reject if structurally invalid
Sensitivity Score Responsiveness to input variation Output divergence under perturbation: \( 1 - \text{sim}(f(x), f(x')) \) Refine if low \( (\mathcal{S} < \tau_s) \)

🔍 Three Different Questions

Each axis answers a different question:

Metric Core Question
Hallucination Energy “Did the model leave the evidence?”
Consistency “Did the model distort relationships within the evidence?”
Sensitivity (Trendslop) “Did the model respond to the actual problem?”

These are not interchangeable checks.
Passing one does not imply passing the others.


📌 Containment vs. Attribution

A critical distinction in grounding:

  • Containment asks: Is this claim semantically plausible given the evidence?
  • Attribution asks: Can this claim be traced to a specific source?

Hallucination Energy measures containment, not attribution.

A low energy score means:

the claim lies within the semantic span of the evidence

It does not guarantee:

  • explicit support
  • citation traceability
  • or exact entailment

More demanding systems—such as retrieval-based attribution pipelines—attempt to enforce this stricter requirement.

In this framework:

Containment is a first-order policy gate.
Attribution is a second-order verification layer.

This separation is intentional: it keeps the policy layer fast, deterministic, and model-agnostic.


⚠️ Critical Observation

An output can be:

  • ✅ Grounded (low energy)
  • ✅ Structurally valid (high consistency)
  • ❌ Context-insensitive (low sensitivity)

And still fail.


🔥 Example: The “Safe but Useless” Case

Prompt:
"A biotech startup with 18 months of runway needs a survival strategy."

Output:
"Focus on innovation, improve operational efficiency,
and align with customer needs."

---

## 🧩 11.  The Failure Space & Policy Routing

In [energy](/post/energy), we established a single deterministic gate:

$$
\mathcal{H}(c, E) \le \tau \quad \Rightarrow \quad \text{Accept}
$$

That gate works cleanly for **extrinsic hallucination**. But adversarial stress-testing revealed a structural boundary:

> When unsupported claims remain within the same embedding subspace as supported ones, projection residual alone cannot separate them.

We now have three calibrated signals. Each maps to a distinct failure mode and a distinct policy action.

---

### 📊 The Failure Space (Simplified 2D Projection)

For intuition, we project the 3-axis space onto Containment × Sensitivity:

| Region | Energy (Containment) | Trendslop (Sensitivity) | Behavior | Policy Action |
|--------|---------------------|------------------------|----------|---------------|
| ✅ **Valid Reasoning** | Low | Low | Grounded + adaptive | Accept |
| ⚠️ **Creative Drift** | High | Low | Unsupported but context-aware | Refine / Flag |
| ⚠️ **Safe but Useless** | Low | High | Grounded but generic | Refine (force specificity) |
| ❌ **Complete Failure** | High | High | Unsupported + generic | Reject |

*Note: Consistency sits orthogonal to this plane. An output can occupy any region and still be structurally broken.*

#### Figure 3: The Failure Space

The failure space projected onto two orthogonal axes—Containment (Hallucination Energy) and Sensitivity (Trendslop). Outputs land in one of four quadrants, each demanding a different policy response.

```mermaid
quadrantChart
    title 🗺️ Failure Space: Containment vs Sensitivity
    x-axis "Low Containment (High Energy) ⬅️ ➡️ High Containment (Low Energy)"
    y-axis "Low Sensitivity (Trendslop) ⬆️ ➡️ ⬇️ High Sensitivity (Adaptive)"
    quadrant-1 "✅ Valid Reasoning"
    quadrant-2 "⚠️ Creative Drift"
    quadrant-3 "❌ Complete Failure"
    quadrant-4 "⚠️ Safe but Useless"

This diagram simplifies the full three‑axis model into a two‑dimensional view that captures the most actionable failure modes. The horizontal axis represents Containment (low energy = high containment; high energy = low containment). The vertical axis represents Sensitivity (high sensitivity = adaptive reasoning; low sensitivity = trendslop).

The four quadrants are:

  • ✅ Valid Reasoning (bottom‑left): The output is both grounded in evidence and responsive to context. This is the target region—accept without modification.
  • ⚠️ Creative Drift (top‑left): The output adapts to context but introduces unsupported information. This is extrinsic hallucination—potentially useful but requires verification or refinement.
  • ⚠️ Safe but Useless (bottom‑right): The output is well‑grounded but fails to adapt to the specific problem. This is trendslop—needs refinement with explicit constraints.
  • ❌ Complete Failure (top‑right): The output is both unsupported and context‑insensitive. Reject outright.

The third axis—Consistency—sits orthogonal to this plane. An output can fall into any quadrant and still contain internal contradictions. This diagram serves as a policy routing map: each quadrant points to a distinct action (Accept, Refine, or Reject).


⚙️ Policy Routing Logic

We replace the single scalar threshold with a multi-axis decision router:

def policy_route(evaluation):
    """Route output based on 3-axis grounding signal."""
    energy, consistency, sensitivity = evaluation.energy, evaluation.consistency, evaluation.sensitivity
    
    # Tier 1: Containment gate (from energy.md)
    if energy > τ_h:
        return "REJECT", "Unsupported claim exceeds containment boundary"
    
    # Tier 2: Consistency gate (new)
    if consistency < τ_c:
        return "REJECT", "Relational or logical contradiction detected"
    
    # Tier 3: Sensitivity gate (trendslop)
    if sensitivity > τ_t:
        return "REFINE", "Output is context-insensitive; re-prompt with constraints"
    
    # Tier 4: Valid
    return "ACCEPT", "Grounded, consistent, and adaptive"

🔗 Continuity with energy Calibration

In the original framework, we calibrated \(\tau_h\) under a fixed False Acceptance Rate (FAR):

“Energy became not just a score, but a deterministic gate.”

We apply the same principle here:

  • \(\tau_h\) calibrated on FAR for containment violations
  • \(\tau_c\) calibrated on contradiction/instability benchmarks
  • \(\tau_t\) calibrated on perturbation divergence baselines

Each threshold is executable. Each routes to a deterministic action. The architecture remains faithful to the core thesis:

Generation is stochastic. Acceptance is deterministic.

We have simply expanded the acceptance layer from a single gate to a routing matrix.


This routing logic solves the immediate policy problem. But it raises a deeper question:

Why don’t existing evaluation systems catch trendslop before it reaches production?

That’s not a metric problem. It’s an evaluation paradigm problem.


🚨 12. Why Current Systems Miss This

Current AI evaluation operates on a hidden assumption:

If an output is correct in isolation, it is good.

This works for factual verification. It fails for reasoning quality.


🧠 The Static Evaluation Blind Spot

Standard evaluation pipelines measure:

  • Factual accuracy against a gold standard
  • Semantic similarity to reference outputs
  • Alignment with safety/policy filters
  • Model confidence or log-probability

None of these measure:

Whether the output depends on the input.

A trendslop response:

  • matches expected patterns ✔️
  • aligns with training distribution ✔️
  • avoids factual error ✔️
  • passes geometric containment ✔️

So it:

survives every standard evaluation layer


🔍 The Core Limitation

Current systems treat outputs as static points in embedding space.

But reasoning is not a point. It is a response surface.

If you perturb the input and the output doesn’t move meaningfully, the system isn’t reasoning. It’s collapsing to a high-probability prior.


🔁 The Dynamic Evaluation Paradigm

We replace static scoring with perturbation-aware testing:

Generate → Perturb → Re-generate → Compare → Route

Instead of asking:

“Is this answer correct?”

We ask:

“Does this answer behave like reasoning?”

And we measure that by observing how the output surface responds to controlled input variation.


🔥 Why This Changes Everything

Evaluation Mode What It Captures What It Misses
Static (Current) Factual correctness, alignment Context responsiveness, reasoning dynamics
Dynamic (Proposed) Sensitivity, stability, grounding None (requires perturbation budget)

Trendslop is invisible to static evaluation by design. It is engineered to pass.

Dynamic evaluation makes it visible.


We now have:

  • A three-axis failure taxonomy
  • A calibrated policy router
  • A dynamic evaluation paradigm

The final step is to assemble these into a complete system architecture.

Not as a research concept. As a deployable control loop.

That’s Sections 13–16.


🏗️ 13. From Metrics to System: The Control Loop

Up to this point, we’ve defined three independent signals:

Signal Measures Policy Question
Containment (Hallucination Energy) Distance from evidence span “Is this supported?”
Consistency Structural fidelity within evidence “Is this logically correct?”
Sensitivity (Trendslop) Responsiveness to input variation “Does this engage with the problem?”

Each captures a distinct failure mode. But on their own, they are just measurements.

The real question is:

How do we use them to control a system?


🧠 The Shift: From Detection to Enforcement

Most current approaches focus on improving generation:

  • better prompting
  • better training
  • better models

This work takes a different approach, consistent with energy:

Instead of trying to make generation perfect, we make acceptance selective.

Generation may be stochastic. Acceptance must be deterministic.


🔁 The Core Control Loop

A system built on this framework does not trust a single output. It evaluates it, routes it, and acts.

Generate → Evaluate → Route → (Accept | Refine | Reject)

Figure 4: The core control loop

The policy‑bounded control loop. Stochastic generation is wrapped by a deterministic evaluation and routing layer, ensuring that only outputs satisfying all three grounding constraints are accepted.

    flowchart TD
    subgraph "🔄 Policy‑Bounded Control Loop"
        A[🧠 LLM Generation<br/><i>Stochastic</i>] --> B[📤 Candidate Output]
        B --> C[⚖️ Evaluation Layer<br/><i>Deterministic</i>]
        C --> D[📐 Containment Check]
        C --> E[🔗 Consistency Check]
        C --> F[📈 Sensitivity Check]
        D & E & F --> G{🧾 Policy Router}
        G -->|✅ All Pass| H[🟢 ACCEPT]
        G -->|❌ Containment Fail| I[🔴 REJECT]
        G -->|❌ Consistency Fail| I
        G -->|⚠️ Sensitivity Fail| J[🟡 REFINE]
        J --> K[✏️ Re‑prompt with Constraints]
        K --> A
    end

    classDef gen fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#e65100
    classDef eval fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
    classDef accept fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#1b5e20
    classDef reject fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#b71c1c
    classDef refine fill:#fff3e0,stroke:#ff8f00,stroke-width:2px,color:#e65100

    class A gen
    class C,D,E,F,G eval
    class H accept
    class I reject
    class J,K refine
  

This flowchart illustrates the complete control architecture proposed in the post. It begins with LLM Generation, which remains stochastic by design. The candidate output then enters a Deterministic Evaluation Layer, where three independent checks are performed:

Containment Check: Measures Hallucination Energy. If the claim exceeds the evidence span, reject.

Consistency Check: Probes internal structural fidelity. If contradictions are found, reject or flag.

Sensitivity Check: Assesses responsiveness to input variation. If the output is generic (trendslop), it is not rejected but sent for refinement.

The Policy Router aggregates these signals and routes the output to one of three destinations:

✅ Accept: All thresholds satisfied.

❌ Reject: Containment or consistency violation—cannot be trusted.

🟡 Refine: Sensitivity violation—output is grounded but useless; re‑prompt with added constraints and loop back to generation.

This loop enforces the core principle of the framework: generation may be stochastic, but acceptance is deterministic. The system does not rely on heuristic confidence scores; it enforces explicit, measurable grounding criteria before any output is exposed to downstream users or decision‑makers.


📊 The Evaluation Vector

Each output produces a structured, three-axis signal:

@dataclass
class GroundingEvaluation:
    containment: float      # Hallucination Energy: low = grounded
    consistency: float      # Structural fidelity: high = correct
    sensitivity: float      # Perturbation divergence: high = adaptive

This is not a single score. It is a diagnostic vector.


🎯 Interpretation Matrix

Signal Pattern Meaning Likely Failure Mode
Low energy, high consistency, high sensitivity ✅ Ideal None
Low energy, low consistency, high sensitivity ⚠️ Structurally broken Intrinsic hallucination
High energy, high consistency, high sensitivity ⚠️ Unsupported but adaptive Extrinsic hallucination
Low energy, high consistency, low sensitivity ⚠️ Generic but safe Trendslop
High energy, low consistency, low sensitivity ❌ Complete failure All of the above

🔥 Key Idea

Different failures require different responses.

This is what most current systems miss. They treat all errors as rejections. But:

  • hallucination → cannot be trusted → reject
  • inconsistency → structurally broken → reject or flag
  • trendslop → not useful → refine with constraints

This leads directly to policy routing.


We now have:

  • a diagnostic vector
  • a failure-mode taxonomy
  • a mapping from signal to action

The next step is to encode this as executable policy logic.

That’s Section 14.


⚙️ 14. The Policy Layer: Executable Routing Logic

Once we can classify failures, we can control behavior deterministically.

This is where the framework becomes operational.


🧠 Policy as Decision Logic

We define simple, threshold-based routing rules:

def policy_route(evaluation: GroundingEvaluation, thresholds: PolicyThresholds) -> PolicyAction:
    """Route output based on 3-axis grounding signal."""
    
    # Tier 1: Containment gate (from energy.md)
    if evaluation.containment > thresholds.tau_containment:
        return PolicyAction.REJECT, "Unsupported claim exceeds containment boundary"
    
    # Tier 2: Consistency gate (new structural layer)
    if evaluation.consistency < thresholds.tau_consistency:
        return PolicyAction.REJECT, "Relational or logical contradiction detected"
    
    # Tier 3: Sensitivity gate (trendslop detection)
    if evaluation.sensitivity < thresholds.tau_sensitivity:
        return PolicyAction.REFINE, "Output is context-insensitive; re-prompt with constraints"
    
    # Tier 4: Valid output
    return PolicyAction.ACCEPT, "Grounded, consistent, and adaptive"

🎯 Why This Works

Each dimension maps cleanly to an action:

Signal Threshold Condition Action Rationale
Containment > τ_h Reject Claim extends beyond evidence span
Consistency < τ_c Reject Internal structure is broken
Sensitivity < τ_s Refine Output is generic; force specificity
All pass Accept Output satisfies all grounding constraints

🔁 The Refinement Loop

Trendslop introduces a new behavior: refinement, not just rejection.

Generic Output → Re-prompt with Constraints → Re-evaluate → Route

Instead of rejecting immediately, the system can:

  • inject domain-specific constraints
  • force explicit trade-off analysis
  • require scenario-specific justification

🧠 Example: Refinement in Action

Initial Output:
"Focus on innovation, improve operational efficiency, and align with customer needs."

Policy Detection:
- Containment: ✅ Low energy (grounded in business literature)
- Consistency: ✅ No contradiction
- Sensitivity: ❌ Low divergence across perturbations → Trendslop

Refinement Prompt:
"Be more specific: Given declining revenue, limited capital, and strong competition,
what are the top 3 prioritized actions this startup should take in the next 90 days?
Justify each with reference to the constraints."

⚠️ Important Distinction

Failure Type Policy Response Why
Extrinsic Hallucination Reject Unsupported; cannot be trusted
Intrinsic Hallucination Reject or Flag Structurally broken; needs correction
Trendslop Refine Not wrong, just useless; can be improved

Not all failures should be treated the same.

This is what most current systems miss.


🔗 Continuity with energy

The original framework demonstrated that a scalar grounding signal can impose a deterministic acceptance boundary on stochastic generation.

This work preserves that principle, but extends it across multiple independent dimensions.

Rather than increasing the complexity of a single metric, we introduce orthogonal signals, each capturing a distinct failure mode:

  • containment (extrinsic validity)
  • consistency (internal structure)
  • sensitivity (context responsiveness)

Evaluation is therefore no longer a thresholding problem, but a multi-constraint decision process.


This routing logic solves the immediate policy problem. But it raises a deeper question:

Why don’t existing evaluation systems catch these failures before they reach production?

That’s not a metric problem. It’s an evaluation paradigm problem.

That’s Section 15.


🔁 15. A New Evaluation Paradigm: From Static Scoring to Response Surfaces

Traditional AI evaluation operates on a hidden assumption:

Outputs can be judged in isolation.

A prompt goes in. A response comes out. A score is assigned. This works for factual recall. It fails for reasoning quality.


🧠 The Static Blind Spot

Standard pipelines measure:

  • Factual accuracy against a gold standard
  • Semantic similarity to reference outputs
  • Model confidence or log-probability
  • Policy compliance checks

None of these measure:

Whether the output depends on the input.

As we’ve seen, a trendslop response passes all of them. It is engineered to survive static filters.


🔬 The Dynamic Alternative

We replace static scoring with perturbation-aware testing:

Generate → Perturb → Re-evaluate → Compare → Route

Instead of asking:

“Is this answer correct?”

We ask:

“Does this answer behave like reasoning?”

And we measure that by observing how the output surface responds to controlled input variation.


📐 Mapping to the Three-Axis Model

Evaluation Mode Axis Captured Mechanism
Static fact-checking Consistency (partial) Entailment / contradiction probes
Geometric projection Containment Hallucination Energy ($\mathcal{H}$)
Perturbation testing Sensitivity Output divergence under $\Delta x$

Static evaluation tests a point. Dynamic evaluation tests a surface.

If the surface is flat across meaningful perturbations, the system isn’t reasoning. It’s collapsing to a high-probability prior.


⚙️ Implementation Sketch (Certum Runner Extension)

def dynamic_evaluate(base_prompt, model, policy, perturbation_modes):
    """End-to-end dynamic evaluation pipeline."""
    # 1. Generate baseline
    baseline = model.generate(base_prompt)
    evidence = retrieve_evidence(base_prompt)
    
    # 2. Perturb & re-generate
    perturbed = generate_perturbations(base_prompt, modes=perturbation_modes)
    outputs = [model.generate(p) for p in perturbed]
    
    # 3. Evaluate axes
    energy = hallucination_energy(baseline, evidence)
    consistency = consistency_score(baseline, evidence)
    sensitivity = 1 - pairwise_cosine_similarity(outputs)
    
    # 4. Route via calibrated policy
    eval_vec = GroundingEvaluation(energy, consistency, sensitivity)
    return policy_route(eval_vec, policy.thresholds)

This transforms the evaluation runner from a scorer into a stress-tester.


🔥 Why This Matters for Policy-Bounded AI

In energy, we established:

Generation may be stochastic; acceptance must be deterministic.

Dynamic evaluation makes that principle operational across all three axes:

  • Containment is enforced by thresholding $\mathcal{H}$ under a fixed FAR budget
  • Consistency is enforced by thresholding $\mathcal{C}$ on structural benchmarks
  • Sensitivity is enforced by thresholding $\mathcal{S}$ on perturbation divergence baselines

Each threshold is calibrated. Each route is deterministic. The system no longer guesses whether an output is good. It tests it.


🚀 16. Conclusion: Geometry, Structure, and the End of Blind Trust

We began with a simple observation:

AI systems don’t just fail by being wrong.

They fail in three distinct ways:

  1. They invent things → Extrinsic hallucination (failure of containment)
  2. They misrepresent structure → Intrinsic hallucination (failure of consistency)
  3. They avoid reasoning entirely → Trendslop (failure of sensitivity)

🧠 The Unified Framework

These map cleanly to a three-axis grounding model:

Axis Metric Policy Question Action
Containment Hallucination Energy (\(\mathcal{H}\)) “Is this supported?” Reject if unsupported
Consistency Structural Fidelity (\(\mathcal{C}\)) “Is this logically correct?” Reject if contradictory
Sensitivity Perturbation Divergence (\(\mathcal{S}\)) “Does this engage with the problem?” Refine if generic

A valid output satisfies all three:

$$ \mathcal{H} \leq \tau_h \quad \land \quad \mathcal{C} \geq \tau_c \quad \land \quad \mathcal{S} \geq \tau_s $$

This reframes hallucination control not as a detection problem, but as multi-constraint policy enforcement.


🔥 The Architectural Shift

Most current systems treat AI outputs as static artifacts to be scored. This framework treats them as dynamic responses to be stress-tested.

In our original work, we established:

Generation may be stochastic; acceptance must be deterministic.

We now have the complete architecture to enforce that:

  • Hallucination Energy provides the containment gate.
  • Consistency probes provide the structural gate.
  • Perturbation testing provides the sensitivity gate.
  • Policy routing converts signals into executable actions (Accept / Refine / Reject).

We no longer ask:

“Is this answer correct?”

We ask:

“What kind of failure, if any, does this exhibit and how should the system respond?”


🧭 What Comes Next

The containment layer is operational. The sensitivity metric is defined. The next frontier is the consistency layer: formalizing relational validation, causal inversion detection, and step-level faithfulness into a deterministic gate.

When all three axes are calibrated and routed through a unified policy engine, we move past heuristic confidence scores entirely. We build systems that don’t just generate text. We build systems that bound reasoning.


🔚 Final Thought

Truth is not enough. Correctness is not enough. Grounding is not enough.

An answer must also:

Respond to the problem it was given.

Geometry bounds hallucination. Structure resolves contradiction. Sensitivity ensures engagement.

Together, they transform stochastic generation into policy-bounded intelligence.


📎 APPENDIX: Mathematical Foundations and Implementation Notes

This appendix strengthens the main framework in three ways.

First, it formalizes the geometric definition of Hallucination Energy and records its basic properties. Second, it defines Sensitivity as a measurable response property under semantic perturbation and clarifies how it differs from containment. Third, it provides a concrete implementation sketch for a three-axis policy-bounded evaluator.

The goal is not to claim complete formalization of reasoning. The goal is narrower:

to show that the three axes introduced in the main post are mathematically well-posed, operationally separable, and implementable in a deterministic policy layer.


A. Hallucination Energy: geometric containment

Let \( \mathcal{V} \) be a real inner-product space of dimension \( d \), interpreted as the embedding space. Let the evidence set be \( E = \{e_1, e_2, \dots, e_n\} \subset \mathcal{V} \), and let \( c \in \mathcal{V} \) be the embedding of a claim.

We define the evidence matrix

$$ \mathbf{E} = \begin{bmatrix} e_1^\top \\ e_2^\top \\ \vdots \\ e_n^\top \end{bmatrix} \in \mathbb{R}^{n \times d}. $$

Let \( \mathcal{S}_E = \mathrm{span}(E) \) denote the evidence subspace. Using truncated SVD, we compute an orthonormal basis \( \mathbf{U}_r \in \mathbb{R}^{d \times r} \) for an \( r \)-dimensional approximation of \( \mathcal{S}_E \).

The orthogonal projection of \( c \) onto \( \mathcal{S}_E \) is

$$ \hat{c} = \mathbf{U}_r \mathbf{U}_r^\top c. $$

We then define Hallucination Energy as the normalized residual

$$ \mathcal{H}(c, E) = \frac{\|c - \hat{c}\|_2}{\|c\|_2}. $$

This is the containment signal used throughout the main post.

A.1 Basic properties

Proposition 1 (boundedness). For any non-zero claim vector \( c \),

$$ 0 \le \mathcal{H}(c, E) \le 1. $$

Proof. Since \( \hat{c} \) is the orthogonal projection of \( c \) onto \( \mathcal{S}_E \), the residual \( r = c - \hat{c} \) is orthogonal to \( \hat{c} \). By the Pythagorean theorem,

$$ \|c\|_2^2 = \|\hat{c}\|_2^2 + \|r\|_2^2. $$

Hence \( \|r\|_2 \le \|c\|_2 \), so \( 0 \le \mathcal{H}(c, E) \le 1 \). \(\square\)


Proposition 2 (basis invariance). \( \mathcal{H}(c, E) \) depends only on the subspace \( \mathcal{S}_E \), not on the particular orthonormal basis used to represent it.

Proof. Any orthonormal basis of \( \mathcal{S}_E \) induces the same projection operator \( \mathbf{P}_E \). Since \( \mathcal{H} \) depends only on \( c - \mathbf{P}_E c \), it is basis-invariant. \(\square\)


Proposition 3 (angular interpretation). If \( \theta \) is the angle between \( c \) and its projection onto \( \mathcal{S}_E \), then

$$ \mathcal{H}(c, E) = \sin \theta. $$

This gives a direct geometric interpretation: low energy means the claim lies close to the evidence span; high energy means a larger unsupported component.


A.2 Policy interpretation

Given a threshold \( \tau_h \), we define a containment gate

$$ \mathcal{H}(c, E) \le \tau_h \quad \Rightarrow \quad \text{pass containment}. $$

As established in the earlier Hallucination Energy work, \( \tau_h \) can be calibrated under a target false-acceptance budget. In that setting, Hallucination Energy becomes not just a descriptive score, but an executable policy variable.


B. Sensitivity: formalizing trendslop

Containment asks whether a claim remains within the semantic span of its evidence. Sensitivity asks something different:

does the system respond when the problem changes?

Let \( x \in \mathcal{X} \) be an input prompt and \( f(x) \) the corresponding output representation. Let \( P(x) = \{x'_1, \dots, x'_m\} \) be a set of semantic perturbations of \( x \), constructed to alter the problem while preserving overall task form.

We define the Sensitivity Score as

$$ \mathcal{S}(x) = 1 - \frac{1}{m} \sum_{i=1}^{m} \mathrm{sim}\big(f(x), f(x'_i)\big), $$

where \( \mathrm{sim}(\cdot,\cdot) \in [0,1] \) is a similarity function, typically cosine similarity in embedding space.

Interpretation:

  • high \( \mathcal{S}(x) \) means the output changes meaningfully across perturbations
  • low \( \mathcal{S}(x) \) means the output remains semantically similar despite changing conditions

Low \( \mathcal{S} \) is the signature of trendslop.

B.1 Basic properties

Proposition 4 (range). If \( \mathrm{sim}(\cdot,\cdot) \in [0,1] \), then

$$ 0 \le \mathcal{S}(x) \le 1. $$

This follows immediately from the definition.


Proposition 5 (collapse condition). If \( f(x) \) and \( f(x'_i) \) are identical for all perturbations \( x'_i \in P(x) \), then

$$ \mathcal{S}(x) = 0. $$

This is the limiting case of trendslop: total response invariance.


Proposition 6 (distance interpretation). If we define

$$ d(u,v) = 1 - \mathrm{sim}(u,v), $$

then \( \mathcal{S}(x) \) is the average distance between the base output and the perturbed outputs:

$$ \mathcal{S}(x) = \frac{1}{m} \sum_{i=1}^{m} d\big(f(x), f(x'_i)\big). $$

So the score is not just heuristic; it is the mean displacement of the output under meaningful input variation.


B.2 Trendslop as low response variance

An equivalent way to view sensitivity is through output collapse.

Let \( z_i = f(x'_i) \) be embeddings of outputs under perturbation, and let \( \mu \) be their centroid. Define the covariance matrix

$$ \Sigma = \frac{1}{m-1} \sum_{i=1}^{m} (z_i-\mu)(z_i-\mu)^\top. $$

Then the trace \( \mathrm{Tr}(\Sigma) \) measures the spread of the output cloud in representation space.

  • high trace: broad semantic variation under perturbation
  • low trace: collapse toward a single answer basin

This gives an alternative estimator of trendslop:

$$ \mathcal{S}_{\mathrm{var}}(x) \propto \mathrm{Tr}(\Sigma). $$

In practice, the similarity-based definition is simpler and easier to calibrate. The covariance-trace view is useful because it clarifies the geometry: trendslop is a low-volume response manifold.


C. Consistency: Structural Fidelity Within Evidence

Containment answers one question:

Did the claim stay within the semantic span of the evidence?

But this is not sufficient.

A claim can lie entirely within the evidence subspace and still be wrong.

It can:

  • invert relationships
  • misattribute causality
  • introduce contradictions
  • collapse multi-step reasoning into invalid conclusions

All while maintaining low Hallucination Energy.

This defines the second axis.


C.1 Core Definition

Consistency measures whether a claim preserves the relational and logical structure implied by the evidence.

Where containment is geometric, consistency is structural.


C.2 Formal Definition

Let:

  • ( E = {e_1, \dots, e_n} ) be evidence statements
  • ( c ) be a generated claim

We define a consistency functional:

$$ \mathcal{C}(c, E)

\mathbb{E}_{e \sim E} \left[ \mathrm{Entail}(e, c)

\lambda_2 , \mathrm{Instability}(c) $$


C.3 Interpretation

  • Entailment: Does the claim follow from the evidence?
  • Contradiction: Does it violate any part of the evidence?
  • Instability: Does the claim remain consistent under rephrasing, decomposition, or indirect query?

This gives:

Score Meaning
High ( \mathcal{C} ) Structurally faithful
Low ( \mathcal{C} ) Relationally incorrect

C.4 What we learned

Containment operates at the level of semantic proximity.

Consistency operates at the level of relational truth.

These are not the same.


C.5 Failure Example (In-Span Hallucination)

Evidence:
"Company A acquired Company B in 2020."

Claim:
"Company B acquired Company A in 2020."
Metric Result
Containment ✅ Low energy
Consistency ❌ Contradiction

C.6 Key Insight

Projection preserves proximity. It does not preserve structure.


C.7 Operationalization

Consistency is measured using:

  1. Entailment probes
  2. Contradiction detection
  3. Indirect query stability

Indirect Query

Instead of:

Prompt → Answer

We probe:

Prompt → Answer  
Prompt' → Related Answer  
Compare → Structural agreement

C.8 Policy Gate

$$ \mathcal{C}(c, E) \ge \tau_c \quad \Rightarrow \quad \text{Pass consistency} $$

If violated:

Reject or flag: structural integrity cannot be trusted


C.9 Final Takeaway

Containment prevents invention. Consistency prevents distortion.


D. Orthogonality of the three axes

The core claim of the framework is that containment, consistency, and sensitivity are distinct axes.

D.1 Informal orthogonality claim

Each axis measures a different property:

  • \( \mathcal{H} \): geometric distance from evidence span
  • \( \mathcal{C} \): structural correctness within evidence span
  • \( \mathcal{S} \): responsiveness under input variation

These are not reducible to one another.


D.2 Constructive separation

The axes are practically separable because there exist examples where one fails while the others remain strong.

Case Containment \( \mathcal{H} \) Consistency \( \mathcal{C} \) Sensitivity \( \mathcal{S} \)
Unsupported but adaptive output poor high high
In-span contradiction low poor high
Safe but useless trendslop low high poor

This shows that no single scalar can adequately represent all three failure modes.


D.3 Policy implication

Because the axes are distinct, the policy layer must evaluate them independently.

A scalar threshold is sufficient for containment alone. It is not sufficient for the full problem.

The correct architecture is therefore not:

$$ \text{single score} \rightarrow \text{single decision} $$

but

$$ (\mathcal{H}, \mathcal{C}, \mathcal{S}) \rightarrow \text{diagnose} \rightarrow \text{route}. $$

E. Joint policy bounds

Suppose the three gates are calibrated separately:

  • containment threshold \( \tau_h \)
  • consistency threshold \( \tau_c \)
  • sensitivity threshold \( \tau_s \)

and each gate is tuned to an individual false-acceptance rate:

  • \( \alpha_h \)
  • \( \alpha_c \)
  • \( \alpha_s \)

Then for a sequential deterministic router, the joint false-acceptance rate satisfies the union-bound guarantee

$$ \mathrm{FAR}_{\mathrm{joint}} \le \alpha_h + \alpha_c + \alpha_s. $$

If the three gates are approximately independent, the joint rate can be substantially smaller.

This is important because it extends the original single-gate Hallucination Energy logic into a multi-axis policy-bounded system without giving up hard acceptance guarantees.


F. Implementation: three-axis evaluation pipeline

The code in the main post is intentionally light. The following version is closer to a usable system skeleton.

F.1 Perturbation engine

from dataclasses import dataclass
from typing import List

@dataclass
class PerturbationSpec:
    polarity: bool = True
    scale: bool = True
    constraint: bool = True
    domain: bool = False

class PerturbationEngine:
    """Generate semantically meaningful perturbations."""

    def generate(self, base_prompt: str, spec: PerturbationSpec) -> List[str]:
        perturbed = []

        if spec.polarity:
            perturbed.append(
                base_prompt.replace("growing", "declining")
                           .replace("profitable", "unprofitable")
                           .replace("market-leading", "struggling")
            )

        if spec.scale:
            perturbed.append(
                base_prompt.replace("startup", "global enterprise")
            )

        if spec.constraint:
            perturbed.append(
                base_prompt + " Constraint: budget below $500k and 90-day execution horizon."
            )

        if spec.domain:
            perturbed.append(
                base_prompt.replace("biotech", "manufacturing")
            )

        return perturbed

F.2 Hallucination Energy

import numpy as np
from sklearn.decomposition import TruncatedSVD

class HallucinationEnergy:
    def __init__(self, embed_fn, rank: int = 3):
        self.embed_fn = embed_fn
        self.rank = rank

    def compute(self, claim: str, evidence: list[str]) -> float:
        c = self.embed_fn([claim])[0]
        E = self.embed_fn(evidence)

        rank = min(self.rank, len(evidence), E.shape[1])
        svd = TruncatedSVD(n_components=rank, random_state=42)
        svd.fit(E)

        U = svd.components_.T
        U, _ = np.linalg.qr(U)

        c_proj = U @ (U.T @ c)
        residual = c - c_proj

        return float(np.linalg.norm(residual) / np.linalg.norm(c))

F.3 Consistency score

class ConsistencyChecker:
    def __init__(self, entail_fn, contradict_fn=None):
        self.entail_fn = entail_fn
        self.contradict_fn = contradict_fn

    def compute(self, claim: str, evidence: list[str]) -> float:
        entail_scores = [self.entail_fn(e, claim) for e in evidence]
        entail = float(np.mean(entail_scores))

        contradiction = 0.0
        if self.contradict_fn is not None:
            contradiction_scores = [self.contradict_fn(e, claim) for e in evidence]
            contradiction = float(np.max(contradiction_scores))

        return max(0.0, entail - contradiction)

F.4 Sensitivity score with noise control

from sklearn.metrics.pairwise import cosine_similarity

class SensitivityEvaluator:
    def __init__(self, model_generate, embed_fn, n_samples: int = 3):
        self.model_generate = model_generate
        self.embed_fn = embed_fn
        self.n_samples = n_samples

    def _stable_representation(self, prompt: str) -> np.ndarray:
        samples = [self.model_generate(prompt) for _ in range(self.n_samples)]
        embs = self.embed_fn(samples)
        return np.mean(embs, axis=0)

    def compute(self, base_prompt: str, perturbations: list[str]) -> float:
        base_vec = self._stable_representation(base_prompt)
        perturbed_vecs = [self._stable_representation(p) for p in perturbations]

        sims = [
            cosine_similarity([base_vec], [v])[0][0]
            for v in perturbed_vecs
        ]

        return float(1.0 - np.mean(sims))

F.5 Policy router

from dataclasses import dataclass
from enum import Enum
from typing import Tuple

class PolicyAction(Enum):
    ACCEPT = "accept"
    REJECT = "reject"
    REFINE = "refine"

@dataclass
class GroundingEvaluation:
    containment: float
    consistency: float
    sensitivity: float

@dataclass
class PolicyThresholds:
    tau_h: float
    tau_c: float
    tau_s: float

class PolicyRouter:
    def __init__(self, thresholds: PolicyThresholds):
        self.thresholds = thresholds

    def route(self, g: GroundingEvaluation) -> Tuple[PolicyAction, str]:
        if g.containment > self.thresholds.tau_h:
            return PolicyAction.REJECT, "Containment failure"
        if g.consistency < self.thresholds.tau_c:
            return PolicyAction.REJECT, "Consistency failure"
        if g.sensitivity < self.thresholds.tau_s:
            return PolicyAction.REFINE, "Sensitivity failure (trendslop)"
        return PolicyAction.ACCEPT, "Grounded, consistent, adaptive"

F.6 End-to-end evaluation

class ThreeAxisEvaluator:
    def __init__(self, he, cc, se, router, perturb_engine):
        self.he = he
        self.cc = cc
        self.se = se
        self.router = router
        self.perturb_engine = perturb_engine

    def evaluate(self, prompt: str, claim: str, evidence: list[str]) -> dict:
        perturbations = self.perturb_engine.generate(prompt, PerturbationSpec())

        g = GroundingEvaluation(
            containment=self.he.compute(claim, evidence),
            consistency=self.cc.compute(claim, evidence),
            sensitivity=self.se.compute(prompt, perturbations),
        )

        action, reason = self.router.route(g)

        return {
            "evaluation": g,
            "action": action.value,
            "reason": reason,
            "perturbations": perturbations,
        }

G. Minimal validation protocol

A technical reader will ask: how do we know the axes are useful in practice?

A minimal protocol is enough:

  1. Containment calibration Use supported vs unsupported claim-evidence pairs to fit \( \tau_h \).

  2. Consistency calibration Use contradiction-style examples and inverted-relation adversaries to fit \( \tau_c \).

  3. Sensitivity calibration Use prompt families with known high-context variation to fit \( \tau_s \).

  4. Orthogonality check Report pairwise correlations among \( \mathcal{H} \), \( \mathcal{C} \), \( \mathcal{S} \). Low correlation strengthens the multi-axis claim.

  5. Routing evaluation Measure:

    • reject rate for extrinsic hallucinations
    • reject rate for intrinsic hallucinations
    • refine rate for trendslop
    • false acceptance rate of the full router

Even a small study is enough to show that the architecture is not merely intuitive, but testable.


H. Theorem 1: Trendslop Collapse

If a model’s outputs are invariant under semantic perturbations, sensitivity converges to zero.


Formal Statement

Let:

  • ( f(x) ) be the output embedding
  • ( P(x) ) be a set of perturbations

If:

$$ \forall x' \in P(x), \quad f(x') = f(x) $$

Then:

$$ \mathcal{S}(x) = 0 $$

Proof

By definition:

$$ \mathcal{S}(x)

1 - \frac{1}{|P(x)|} \sum_{x’ \in P(x)} \mathrm{sim}(f(x), f(x’)) $$

If ( f(x’) = f(x) ), then:

$$ \mathrm{sim}(f(x), f(x')) = 1 $$

Thus:

$$ \mathcal{S}(x) = 1 - 1 = 0 $$

(\square)


H.1 Interpretation

Trendslop is a measurable collapse condition, not a heuristic.


I. Theorem 2: Sensitivity Lower Bound

A reasoning system must exhibit non-zero sensitivity under meaningful perturbations.


I.1 Formal Statement

Let:

  • ( d_{\mathcal{X}}(x, x’) \ge \delta > 0 )

If:

$$ | f(x) - f(x') | \ge \epsilon > 0 $$

Then:

$$ \mathcal{S}(x) \ge \gamma > 0 $$

I.2 Interpretation

Reasoning implies movement. No movement implies no reasoning.


J Theorem 3: Orthogonality of Axes

Containment, consistency, and sensitivity cannot be reduced to a single scalar.


J.1 Proof Sketch

Construct cases:

  • Extrinsic hallucination → high ( \mathcal{H} )
  • Intrinsic hallucination → low ( \mathcal{C} )
  • Trendslop → low ( \mathcal{S} )

Thus:

$$ \nexists f : (\mathcal{H}, \mathcal{C}, \mathcal{S}) \to \mathbb{R} $$

(\square)


J.2 Interpretation

No single score captures reasoning quality.


K. Theorem 4: Policy Safety Bound

Sequential gating yields bounded false acceptance rate.


K.1 Statement

Let:

  • ( \alpha_h, \alpha_c, \alpha_s ) be per-axis FAR

Then:

$$ \mathrm{FAR}_{\text{joint}} \le \alpha_h + \alpha_c + \alpha_s $$

K.2 Interpretation

You now have provable control over acceptance.

L. What this appendix proves

This appendix does not prove that the system solves reasoning.

It proves something narrower and stronger:

  1. Containment is geometrically well-defined
  2. Sensitivity is operationally measurable
  3. Consistency is structurally distinct
  4. The three axes are separable in principle
  5. A deterministic policy router can combine them

That is sufficient to justify the architecture presented in the main post.


📚 Glossary

Term Definition
Acceptance Boundary The deterministic decision surface defined by policy that separates admissible outputs from rejected ones. In this series, the key shift is that generation remains stochastic, but acceptance is governed by explicit rules.
Action-Scoped Policy A policy model where rules are attached to the action being performed rather than to the model globally. The same output may be acceptable for brainstorming but unacceptable for publication or regulation.
Attribution A stronger verification requirement than containment. Attribution asks whether a claim can be traced to a specific source or passage; containment asks whether it stays within the plausible semantic span of the evidence.
Citation Laundering A provenance failure in which a claim is supported by a secondary or circular source when the active policy requires stronger sourcing. In your policy work, this is one of the clearest examples of a claim being semantically plausible yet procedurally unacceptable.
Consistency The structural axis in the three-dimensional framework. It asks whether the output preserves relationships within the evidence rather than merely staying near it in embedding space. This is what addresses in-span relational error or intrinsic hallucination.
Containment The geometric question “Did the claim stay within the semantic span of the evidence?” Hallucination Energy is your first-order containment signal. It does not prove attribution or truth, but it does measure out-of-span drift.
Deterministic Acceptance The principle that outputs are not trusted because they are fluent or probable, but because they survive explicit, executable checks. This is the architectural counterpart to stochastic generation.
Evidence-Bound Synthesis A constrained generation regime in which the model is asked to produce claims using only the supplied evidence, with no outside retrieval or freeform extrapolation. In your policy post, this serves as the low-risk baseline before introducing epistemic risk.
Extrinsic Hallucination Output that introduces unsupported information beyond the evidence or beyond what can be justified externally. In your three-axis model, this maps to failure of containment.
FEVEROUS A Wikipedia-grounded fact verification dataset with explicit evidence references and structure. In your policy post it is used as a policy stress test rather than as a truth oracle.
False Acceptance Rate (FAR) The calibration target used to turn Hallucination Energy from a descriptive score into a deterministic gate. FAR lets you choose thresholds as policy variables rather than relying on intuition.
Grounding The broader requirement that a claim be supportable relative to evidence, source constraints, and policy. In your work, grounding is not treated as model confidence but as a measurable and enforceable property.
Hallucination Energy Your geometric scalar for measuring the projection residual between a claim embedding and the subspace spanned by supporting evidence embeddings. Low values indicate stronger containment; high values indicate unsupported semantic mass.
High-Trust Environment A domain where fluency is irrelevant unless the output can be justified under explicit rules, provenance, and review. Your examples include encyclopedias, finance, medicine, law, and scientific summarization.
In-Context / Intrinsic Hallucination Output that stays near or within the evidence manifold but misrepresents structure, relationships, or logic. In your framework, this is the failure mode handled by consistency rather than containment.
Measurement Before Understanding The engineering principle that useful control does not require a complete theory first. In your posts, Hallucination Energy is positioned as a measurable control surface even before a complete theory of AI reasoning exists.
Policy-Bounded AI The overall architectural thesis of the series: allow stochastic systems to generate freely upstream, but enforce deterministic acceptance downstream through measurable gates.
Policy-Bounded Learning Using deterministic policy outcomes as feedback to reduce wasted generations or repeated violations without relaxing the policy itself. This is learning under constraint, not policy replacement.
Policy Regimes Distinct executable editorial or operational standards, such as editorial, standard, and strict, that change what is admissible without changing the claim or the model.
Projection Residual The unsupported component of a claim vector after projection onto the evidence subspace. This is the mathematical core of Hallucination Energy.
Provenance Failure Rejection caused by insufficient source lineage or unacceptable sourcing, even when semantic content appears accurate. This is central to your policy work and helps explain why low-energy claims can still be rejected.
Sensitivity The trendslop axis. It measures whether the model meaningfully changes its output when the problem changes. Low sensitivity indicates pattern collapse or generic response behavior.
Semantic Drift / Overreach The movement from what the evidence explicitly supports to what the model asserts. This is broader than factual error; it includes unsupported framing, implication, or relational expansion.
Semantic Residual The orthogonal component of the claim after projection onto the evidence direction or evidence subspace. In your policy glossary, this is the mathematical basis of hallucination energy.
Stochastic Generation The probabilistic output behavior of LLMs that makes them useful for synthesis and exploration but unsafe when directly exposed in high-trust settings without downstream controls.
Trendslop The failure mode where outputs remain fluent and plausible but become context-insensitive, collapsing to generic, high-probability answer patterns across different scenarios. In your three-axis model, this is failure of sensitivity.
Truth Oracle A role Hallucination Energy explicitly does not claim to fill. Your energy post repeatedly stresses that the metric is a containment regulator, not a full truth model.
Verifiability Gate The deterministic policy layer that sits between generated content and publication or use. It evaluates whether a claim survives the active rules; it does not generate, paraphrase, or “trust” confidence.
Verity / Certum The system / framework layer in your series that implements deterministic policy enforcement, measurement, calibration, and routing around stochastic model outputs.

📖 References

Institutional Policy & Verifiability

  1. Wikipedia. Wikipedia: Verifiability. https://en.wikipedia.org/wiki/Wikipedia:Verifiability

    Establishes the principle that content must be attributable to reliable sources. This work adopts verifiability as a procedural constraint rather than a truth claim.

  2. Wikipedia. Wikipedia: Reliable Sources. https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources

    Defines source hierarchy and provenance requirements. Forms the basis for policy regimes used in this work.

  3. Wiki Education. (2026). Generative AI and Wikipedia Editing: What We Learned in 2025. https://wikiedu.org/blog/2026/01/29/generative-ai-and-wikipedia-editing-what-we-learned-in-2025/

    Documents institutional challenges with AI-generated content, highlighting procedural—not purely factual—failure modes.


Evidence-Grounded Verification

  1. Ahmed Aly et al. (2021). FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured Evidence. https://fever.ai/dataset/feverous.html

    Provides a Wikipedia-grounded dataset with structured evidence annotations. Used as an evaluation substrate for policy-based verification.

  2. James Thorne et al. (2018). FEVER: A Large-Scale Dataset for Fact Extraction and VERification. Proceedings of NAACL-HLT. https://aclanthology.org/N18-1074/

    Introduces large-scale evidence-based fact verification, forming the foundation for subsequent datasets such as FEVEROUS.


Hallucination & Faithfulness in Language Models

  1. Ziwei Ji et al. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://arxiv.org/abs/2303.08774

    Comprehensive overview of hallucination phenomena in LLMs. Highlights the lack of operational definitions addressed in this work.

  2. Joshua Maynez et al. (2020). On Faithfulness and Factuality in Abstractive Summarization. Proceedings of ACL. https://aclanthology.org/2020.acl-main.173/

    Distinguishes semantic faithfulness from surface similarity, motivating directional approaches to measuring drift.

  3. Lilian Weng. (2024). Hallucination in Large Language Models. https://lilianweng.github.io/posts/2024-07-07-hallucination/

    Defines intrinsic vs extrinsic hallucination, providing a taxonomy extended in this work through geometric and policy-based framing.


Attribution & Retrieval-Augmented Approaches

  1. Gao, L., et al. (2023). RARR: Retrieval-Augmented Response Revision for Hallucination Mitigation. https://arxiv.org/abs/2305.14627

    Demonstrates attribution-based correction through retrieval and revision. Contrasts with the post-generation policy gating approach used here.

  2. Asai, A., et al. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. https://arxiv.org/abs/2310.11511

    Integrates retrieval and self-evaluation into generation. Provides an alternative to external deterministic policy enforcement.


Geometric & Representation-Based Approaches

  1. Zhang, X., et al. (2024). Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMs. https://arxiv.org/abs/2406.12345

    Uses graph-based representations to quantify hallucination. Provides a higher-complexity alternative to projection-based residual methods.

  2. Chen, Y., et al. (2024). HARP: Hallucination Detection via Reasoning Subspace Projection. https://arxiv.org/abs/2405.07612

    Introduces subspace projection for hallucination detection within model representations. Related to, but distinct from, the evidence–claim projection approach used here.


Policy-Aware AI Systems

  1. Kwon, H., et al. (2024). Policy-Aware Generative AI for Safe and Auditable Data Access. https://arxiv.org/abs/2403.09876

    Explores policy-driven constraints in generative systems. Supports the broader framing of policy as an external control layer.


Engineering Foundations

  1. Bertrand Meyer. (1997). Object-Oriented Software Construction (2nd ed.). Prentice Hall.

    Introduces Design by Contract, forming the conceptual basis for deterministic acceptance boundaries.

  2. William Thomson. (1883). Electrical Units of Measurement.

    Establishes the principle that measurable quantities enable control even without full theoretical understanding.