Hallucination Energy: A Geometric Foundation for Policy-Bounded AI

Hallucination Energy: A Geometric Foundation for Policy-Bounded AI
Page content

๐Ÿš€ Summary

This post presents the current research draft and implementation of a geometric framework for bounding stochastic language models through deterministic policy enforcement.

The central contribution is a scalar metric termed Hallucination Energy, defined as the projection residual between a claim embedding and the subspace spanned by its supporting evidence embeddings. This metric operationalizes grounding as a measurable geometric quantity.

We proceed in three stages:

  1. Formal Definition a draft manuscript introducing Hallucination Energy, its mathematical formulation, and its role within a policy-controlled architecture.
  2. Empirical Evaluation structured calibration and adversarial stress testing across multiple domains to assess the robustness and limits of projection-based grounding.
  3. Applied Validation large-scale evaluation on 10,000 samples from the HaluEval summarization benchmark, demonstrating that projection-based containment functions as a strong first-order grounding signal in a real generative setting.

This work does not claim to solve hallucination. Rather, it characterizes the boundary of projection-based grounding, establishes its suitability as a deterministic policy scalar, and documents both its strengths and its structural limitations.


๐Ÿ› ๏ธ Code Availability

All experiments, calibration routines, adversarial generators, and evaluation pipelines described in this post are implemented in the open-source Certum framework:

๐Ÿ”— https://github.com/ernanhughes/certum

The repository contains:

  • The projection-based Hallucination Energy implementation
  • Deterministic policy gating and calibration modules
  • Adversarial negative construction modes
  • Multi-axis evaluation runners
  • Large-scale summarization validation pipelines

All reported results in this post were generated using this codebase.


๐Ÿ“ The Paper


Hallucination Energy: A Geometric Foundation for Policy-Bounded AI

Ernan Hughes ernanhughes@gmail.com

Abstract

Large language models generate fluent but stochastic outputs, making deterministic guarantees difficult in high-trust environments. We introduce Hallucination Energy, a geometry-grounded scalar that measures the projection residual of a claim embedding onto the subspace spanned by its supporting evidence. Intuitively, well-supported claims lie largely within the evidence span, while unsupported or fabricated claims exhibit larger residual energy.

Hallucination Energy provides a model-agnostic, embedding-based measure of grounding that does not rely on model confidence, retrieval heuristics, or LLM-based judges. We show that this scalar yields consistent separability between supported and unsupported claims across structured evidence-verification datasets.

We further evaluate the approach in a generative setting using the HaluEval summarization benchmark (Minervini et al., 2023), where geometry-based models achieve up to 0.72 AUC without retrieval augmentation or additional supervision. While nonlinear models provide modest gains, empirical results indicate that projection residual energy captures the dominant discriminative signal under linear embedding geometry.

These results support the feasibility of deterministic policy gates grounded in measurable geometric criteria. At the same time, experiments under adversarial semantic overlap reveal limits of scalar projection methods, delineating the boundary of linear grounding geometry. We position this work as a step toward policy-bounded AI systems that replace heuristic confidence measures with explicitly defined geometric acceptance rules.


1. Introduction

Large language models (LLMs) produce coherent and fluent text, yet their outputs remain stochastic. In high-trust domainsโ€”legal reasoning, scientific summarization, knowledge curationโ€”acceptance cannot rely solely on model confidence or internal heuristics.

We begin from a systems-level premise:

Generation may be stochastic; acceptance must be deterministic.

This separation motivates policy-bounded AI: a system architecture in which outputs are filtered through explicit, measurable acceptance criteria before being exposed to users.

The central challenge is to define such criteria in a way that is:

  • Model-agnostic
  • Measurable
  • Thresholdable
  • Grounded in evidence

We propose Hallucination Energy as a geometric scalar that satisfies these requirements.


2. Hallucination Energy

2.1 Problem Setup

Given:

  • A claim \( c \)
  • A set of evidence sentences \( E = {e_1, e_2, \dots, e_n} \)

Let:

  • \( \mathbf{c} \in \mathbb{R}^d \) be the embedding of the claim
  • \( \mathbf{e}_i \in \mathbb{R}^d \) be embeddings of evidence sentences

Define the evidence matrix:

$$ \mathbf{E} = [\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n]^T $$

We compute an orthonormal basis \( \mathbf{U}_r \) of rank \( r \) for the span of \( \mathbf{E} \) via truncated SVD.


2.2 Definition

The projection of the claim embedding onto the evidence span is:

$$ \hat{\mathbf{c}} = \mathbf{U}_r \mathbf{U}_r^T \mathbf{c} $$

The Hallucination Energy is defined as the normalized residual:

$$ \mathcal{H}(c, E) = \frac{|\mathbf{c} - \hat{\mathbf{c}}|_2}{|\mathbf{c}|_2} $$

Intuition:

  • If a claim is well supported by evidence, it lies largely within the evidence subspace.
  • Unsupported claims exhibit larger orthogonal residuals.

Hallucination Energy is therefore a scalar measure of geometric grounding.


3. Policy-Bounded Architecture

We embed Hallucination Energy within a two-stage architecture:

  1. Stochastic Generation Layer

    • LLM produces candidate output.
  2. Deterministic Acceptance Layer

    • Compute Hallucination Energy.
    • Accept if: $$ \mathcal{H}(c, E) \le \tau $$
    • Reject or request revision otherwise.

The threshold \( \tau \) serves as a policy control parameter, enabling configurable trade-offs:

| Low ฯ„ | Strict grounding, low creativity | | High ฯ„ | Higher tolerance, greater creativity |

This reframes hallucination detection as a measurable policy dial rather than a binary classifier.


4. Experimental Evaluation

4.1 Structured Evidence Verification

Across evidence-based datasets, Hallucination Energy consistently separates supported from unsupported claims under equal false-acceptance constraints.

We observe:

  • Clear separability in structured regimes.
  • Stable threshold calibration.
  • Scalar dominance over additional linear features.

4.2 Generative Summarization (HaluEval)

We evaluate on the HaluEval summarization benchmark (Minervini et al., 2023), which provides:

  • Source documents
  • Correct summaries
  • Hallucinated summaries

We treat summaries as claims relative to source documents.

Results

Model AUC
Geometry only 0.669
Logistic fusion 0.691
XGBoost 0.723 ยฑ 0.013

Key observations:

  • Projection residual alone provides strong signal.
  • Nonlinear models yield modest gains.
  • No retrieval augmentation is used.
  • No LLM-as-judge supervision is required.

This demonstrates that geometric grounding captures a substantial portion of hallucination signal in real generative tasks.


5. Empirical Observations

5.1 Scalar Dominance

Across experiments, additional axes (coverage, participation ratio, sensitivity) provide marginal improvements over residual energy alone.

This suggests:

Projection residual energy captures the dominant discriminative signal under linear embedding geometry.


5.2 Adversarial Semantic Overlap

Under hard semantic mimicryโ€”where unsupported claims remain topically aligned with evidenceโ€”separability degrades.

This reveals:

  • Limits of linear projection geometry.
  • Need for potential higher-order or nonlinear structure.
  • Structural ceiling under subspace overlap.

These findings define the boundary of scalar methods.


Recent work has explored geometric and spectral approaches to hallucination detection.

  • Reasoning Subspace Projection (HARP) decomposes hidden-state representations into semantic and reasoning subspaces within model internals. Our approach differs by operating externally at the evidenceโ€“claim interface and requiring no access to hidden states.

  • Spectral-graph hallucination energy methods define energy over multimodal Laplacian manifolds. In contrast, our formulation uses minimal projection residual geometry without graph construction or diffusion modeling.

  • Policy-aware generative systems employ symbolic or rule-based gates. Our contribution is a continuous geometric scalar that enables policy control through measurable thresholds rather than symbolic constraints.

Our approach is lightweight, model-agnostic, and deployable without architectural modification of the underlying LLM.


7. Discussion

Hallucination Energy does not solve hallucination detection universally.

However, it provides:

  • A measurable scalar grounding signal
  • Deterministic thresholdability
  • Model-agnostic implementation
  • Interpretable geometric meaning

This supports a broader architectural claim:

Stochastic generation can be bounded by deterministic geometric acceptance criteria.

Future work may explore nonlinear geometry, structured evidence representations, or multi-dimensional grounding signals.


8. Conclusion

We introduced Hallucination Energy as a projection-residual scalar measuring geometric containment between claims and evidence.

Across structured verification and large-scale summarization, this scalar provides consistent separability without reliance on model confidence or retrieval augmentation.

Adversarial experiments expose its representational boundary: projection residual measures containment, not relational contradiction.

Hallucination Energy is not a truth oracle.

It is a containment regulator.

Containment is sufficient to define deterministic policy thresholds over stochastic generation.

That architectural shift separating stochastic generation from deterministic acceptance is the central contribution of this work.


References

@article{sarkar2025grounding,
  title={Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMS},
  author={Sarkar, Supratik and Das, Swagatam},
  journal={arXiv preprint arXiv:2508.19366v4},
  year={2025}
}

@article{hu2025harp,
  title={HARP: Hallucination Detection via Reasoning Subspace Projection},
  author={Hu, Junjie and Tu, Gang and Cheng, Sheng Yu and Li, Jinxin and Wang, Jinting and Chen, Rui and Zhou, Zhilong and Shan, Dongbo},
  journal={arXiv preprint arXiv:2509.11536v2},
  year={2025}
}

@article{almandalawi2025policy,
  title={Policy-Aware Generative AI for Safe, Auditable Data Access Governance},
  author={Al Mandalawi, Shames and Mohammed, Muzakkiruddin Ahmed and Maclean, Hendrika and Cakmak, Mert Can and Talburt, John R.},
  journal={arXiv preprint arXiv:2510.23474v1},
  year={2025}
}

๐Ÿ“‹ Empirical Evaluation

๐ŸŽ Policy as a Bounding Box

Using policy we can create a bounding box around AI systems.

The language model operates inside the box. It may generate freely. But nothing leaves the box unless it satisfies explicit grounding criteria.

Policy is not guidance. It is enforcement.

We can define:

  • A strict policy that rejects nearly all uncertain outputs.
  • A permissive policy that tolerates creative extrapolation.
  • Or a total lockdown policy that blocks everything.

The important shift is architectural:

Generation is stochastic. Acceptance is deterministic.

To make this viable, we needed a scalar measure of grounding that could serve as a threshold.

This led us to the development of Hallucination Energy.

    flowchart TD
    A[["๐Ÿ“„ Claim<br/>Generated or Provided"]] --> B["๐Ÿ“š Evidence Set<br/>Retrieved or Supplied"]

    B --> C["๐Ÿงฎ Grounding Measurement<br/>(e.g. Hallucination Energy)"]

    A --> C

    C --> D["๐Ÿšจ Verifiability Gate<br/>(Executable Policy)"]

    D -->|โœ… Within Policy| E["๐ŸŸข PASS<br/>Claim Accepted"]
    D -->|โŒ Violates Policy| F["๐Ÿ”ด FAIL<br/>Claim Rejected"]
    D -->|โš ๏ธ Borderline| G["๐ŸŸก UNCLEAR<br/>Needs Human Review"]

    classDef dataset fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
    classDef process fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
    classDef retrieval fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#e65100
    classDef gate fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#880e4f
    classDef pass fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#1b5e20
    classDef fail fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#b71c1c
    classDef unclear fill:#fff3e0,stroke:#ff8f00,stroke-width:2px,color:#e65100

    class A dataset
    class B dataset
    class C process
    class D gate
    class E pass
    class F fail
    class G unclear
  

๐Ÿงš Hallucination Energy

Policy requires measurement.

Without a measurable grounding signal, policy is symbolic it cannot be executed. To enforce deterministic acceptance over stochastic generation, we needed a scalar quantity that captures how strongly a claim is supported by its evidence.

This led to the formulation of Hallucination Energy.

๐Ÿ’ก Intuition

If a claim is grounded in evidence, its meaning should lie inside the semantic span of that evidence.

If it is not grounded, some component of its meaning must extend beyond that span.

Hallucination Energy measures exactly that residual.


๐Ÿ“ Geometric Formulation

Given:

  • A claim embedding \( \mathbf{c} \in \mathbb{R}^d \)
  • A set of evidence embeddings \( \mathbf{e}_1, \dots, \mathbf{e}_n \)

We compute a low-rank orthonormal basis for the subspace spanned by the evidence. The claim is then projected into that subspace.

Let:

$$ \hat{\mathbf{c}} = \text{Projection of } \mathbf{c} \text{ onto the evidence span} $$

The residual component:

$$ \mathbf{r} = \mathbf{c} - \hat{\mathbf{c}} $$

The Hallucination Energy is the normalized residual magnitude:

$$ \mathcal{H}(c, E) = \frac{|\mathbf{r}|_2}{|\mathbf{c}|_2} $$

๐Ÿ˜• Interpretation

  • Low energy โ†’ Claim lies largely within evidence span โ†’ Strong grounding
  • High energy โ†’ Significant orthogonal residual โ†’ Likely unsupported

The scalar is:

  • Model-agnostic
  • Thresholdable
  • Deterministic
  • Independent of model confidence

This makes it suitable as a policy enforcement signal.


โ€ผ๏ธ Why It Matters

Hallucination Energy is not just another feature.

It is:

  • The measurable axis that enables policy thresholds.
  • The scalar that allows stochastic generation to be bounded deterministically.
  • The dominant signal observed across structured and generative benchmarks in our experiments.

Every experiment in this post every dataset, every ablation, every nonlinear model builds on this quantity.

It is the foundation.

The remainder of this post explores how far this scalar can go where it performs strongly, where it weakens, and what that boundary tells us about grounding geometry.


๐Ÿ’ฝ Preparing the Data

Hallucination Energy operates over a simple abstraction:

  • A claim ( c )
  • A set of evidence texts ( E = {e_1, \dots, e_n} )
  • A binary label indicating whether the claim is supported

Everything in our experiments reduces to this triplet.

The structure of the data matters.


๐Ÿ“€ Claimโ€“Evidence Format

Across experiments, we normalize all datasets into the same canonical structure:

{
  "claim": "...",
  "evidence": ["...", "...", "..."],
  "label": 0 or 1
}

Where:

  • label = 0 โ†’ supported / grounded
  • label = 1 โ†’ hallucinated / unsupported

This abstraction allows us to:

  • Apply the same projection-based geometry
  • Compute residual energy consistently
  • Compare across structured and generative regimes

But not all datasets conform equally well to this structure.


โœ… Structured Verification Datasets

Datasets like FEVER-style verification sets are ideal for projection geometry:

  • Claims are atomic.
  • Evidence spans are explicit.
  • Support relationships are localized.
  • Unsupported claims often introduce new entities or relations.

In this regime, projection residual behaves predictably:

  • Supported claims lie within the evidence span.
  • Unsupported claims produce measurable orthogonal residuals.

These datasets provide clean separability.

They are controlled environments.


โœ๏ธ Generative Summarization (HaluEval)

Summarization is structurally different.

Here:

  • The โ€œclaimโ€ is an entire generated summary.
  • The โ€œevidenceโ€ is the full source document.
  • Hallucination may occur at the sentence, phrase, or entity level.
  • Much of the summary is grounded only parts may deviate.

This makes the geometry harder:

  • The evidence span is large.
  • The claim vector blends grounded and ungrounded components.
  • Residual energy becomes a fractional measure rather than binary.

Despite this, projection residual still achieves stable separability (~0.72โ€“0.74 AUC at scale), suggesting that grounding geometry remains dominant even under generative freedom.


๐Ÿ˜ญ Why Some Datasets Fail

Not all claimโ€“evidence datasets are suitable for geometric grounding analysis.

We encountered three common failure modes:

1๏ธโƒฃ Trivial Negatives

Some datasets generate negatives by pairing claims with unrelated evidence. These produce near-total separation but this is artificial.

The residual signal becomes obvious and uninformative.

This does not stress-test grounding geometry.


2๏ธโƒฃ Label Ambiguity

In certain legal or QA datasets:

  • The distinction between supported and unsupported is contextual.
  • Multiple evidence spans may partially support different interpretations.
  • The โ€œnegativeโ€ example is not truly hallucinated merely incorrect.

Projection residual struggles when labels do not correspond to geometric divergence.


3๏ธโƒฃ Semantic Overlap Adversaries

The most interesting failure mode occurs when unsupported claims remain semantically aligned with evidence.

In these cases:

  • The claim reuses entities and structure.
  • The subspace overlap is high.
  • The orthogonal residual shrinks.

This reveals a structural limit of scalar projection geometry.

The problem is no longer distance it is internal structure.


โ˜‘๏ธ Dataset Selection Principles

From these observations, we adopt the following criteria when selecting datasets:

  1. Claims must be semantically comparable to evidence.
  2. Negatives must differ in grounding, not just topic.
  3. Labels must reflect support, not preference.
  4. Evidence spans must be sufficiently localized to define a meaningful subspace.

Under these constraints, both structured verification and generative summarization provide useful signals but for different reasons.


๐Ÿคฏ Key Insight

The effectiveness of Hallucination Energy depends less on model architecture and more on the geometric relationship between claims and evidence.

Datasets that preserve this relationship expose the signal.

Datasets that distort or trivialize it either collapse separation or inflate it artificially.

Understanding this distinction was as important as the metric itself.


๐ŸงŠ Geometric Rigidity

Hallucination Energy is not conservative by design. It is constrained by construction.

The scalar measures the orthogonal residual of a claim embedding relative to the subspace spanned by its evidence. If a semantic component of the claim is not represented in that evidence span, it will appear in the residual. That is not a policy choice it is a geometric consequence.

The rigidity arises from the projection itself:

  • The evidence defines a subspace.
  • The claim is decomposed into in-span and out-of-span components.
  • Anything outside the span is measured as residual energy.

There is no interpretation layer. There is no semantic forgiveness. There is no contextual adjustment.

The calculation simply reflects geometric containment.

In this sense, the scalar is structurally rigid.

It does not adapt to intent or plausibility. It only measures alignment with the provided evidence.


๐Ÿคท Why That Matters

This rigidity is not an arbitrary restriction it is a property of projection geometry. 1 If a claim introduces:

  • New entities,
  • New relations,
  • Unsupported inferences,
  • Or semantic extensions beyond the evidence span,

those components must appear as orthogonal deviation.

The method cannot โ€œsmoothโ€ or reinterpret them.

It cannot infer missing structure.

It measures what is present and nothing more.


๐ŸŽˆ Policy Introduces Flexibility

Because the scalar itself is fixed by geometry, flexibility enters only through policy thresholds.

We do not modify the grounding measure. We modify the tolerance.

  • Low threshold โ†’ strict enforcement
  • High threshold โ†’ permissive acceptance

The geometry remains constant. Policy adjusts acceptable deviation.

This separation is essential:

Measurement is fixed. Tolerance is configurable.


๐Ÿ”‹ Generating and Measuring Hallucination Energy in Certum

The preceding sections define Hallucination Energy formally. This section describes how it is computed and evaluated in practice across structured evidence datasets such as FEVEROUS and PubMed-style biomedical verification.

All experiments are executed through the Certum evaluation runner shown above.


โ˜ฏ๏ธ 1. Canonical Claimโ€“Evidence Structure

Every dataset is normalized into a common format:

{
  "claim": "...",
  "evidence": ["...", "..."],
  "label": 0 or 1
}

Where:

  • label = 0 โ†’ supported
  • label = 1 โ†’ unsupported / hallucinated

This abstraction allows the same geometric pipeline to operate across legal, biomedical, and general knowledge domains.


๐Ÿ”ฉ 2. Embedding and Subspace Construction

For each sample:

  1. The claim is embedded using a sentence-transformer model.
  2. Each evidence sentence is embedded.
  3. A low-rank basis of the evidence span is computed using truncated SVD.
  4. The claim embedding is projected into that subspace.
  5. The orthogonal residual magnitude is measured.

Formally:

$$ \hat{\mathbf{c}} = \mathbf{U}_r \mathbf{U}_r^T \mathbf{c} $$$$ \mathcal{H}(c, E) = \frac{|\mathbf{c} - \hat{\mathbf{c}}|_2}{|\mathbf{c}|_2} $$

This scalar is stored alongside auxiliary geometric features such as:

  • max_energy
  • mean_energy
  • p90_energy
  • energy_gap
  • coverage
  • alignment margins

These form the geometry feature set inside the runner.


๐Ÿšฐ 3. Evaluation Pipeline

The runner executes the following steps:

  1. Summarization / Verification Pipeline

    • Compute embeddings
    • Compute Hallucination Energy
    • Store per-sample diagnostics
  2. Feature Extraction

    • Convert results into structured feature DataFrame
  3. Model Evaluation

    • Logistic regression
    • Cross-validation
    • Bootstrap confidence intervals
    • Precisionโ€“Recall analysis
    • Calibration curves
    • XGBoost nonlinear baseline

All experiments are reproducible and written to disk as JSON reports.


โœ… 4. FEVEROUS Findings

In structured verification datasets such as FEVEROUS:

  • Claims are atomic.
  • Evidence spans are localized.
  • Unsupported claims often introduce new entities or relations.

In this regime, Hallucination Energy exhibits clear separation:

  • Supported claims lie largely within evidence span.
  • Unsupported claims show measurable orthogonal deviation.

We consistently observe:

  • Stable AUC under logistic evaluation.
  • Strong inverse relationship between residual energy and semantic alignment.
  • Minimal benefit from nonlinear modeling.

This indicates that projection residual captures the dominant grounding signal in structured verification.


๐Ÿ‘ท 5. PubMed-Style Biomedical Verification

Biomedical datasets introduce additional challenges:

  • Terminology density is high.
  • Semantic overlap between supported and unsupported claims is often strong.
  • Negatives may differ in relation rather than vocabulary.

Despite this, the projection residual remains informative:

  • Supported biomedical claims cluster at lower residual energy.
  • Unsupported claims exhibit higher variance in residual magnitude.
  • However, separability decreases under heavy semantic overlap.

This highlights an important boundary:

Hallucination Energy is sensitive to semantic containment, but not to internal relational inversion when subspace overlap remains high.

In other words, the method detects out-of-span deviation more reliably than in-span structural contradiction.


๐Ÿ” 6. Observed Structural Patterns

Across FEVEROUS, PubMed, and generative summarization:

  • Residual energy and alignment metrics exhibit consistent inverse correlation.
  • Ablation studies show minimal change when removing secondary energy features.
  • Nonlinear boosting improves AUC modestly (~+0.02).

This suggests:

Projection residual is the primary explanatory axis under linear embedding geometry.


๐Ÿ•ต 7. What This Tells Us

Hallucination Energy is:

  • Robust under dataset scaling.
  • Domain-agnostic.
  • Structurally constrained by projection geometry.

Its strengths emerge in:

  • Entity introduction detection.
  • Unsupported semantic expansion.
  • Out-of-span deviation.

Its limits appear in:

  • Subtle relational inversion.
  • High-overlap semantic mimicry.
  • Adversarial in-span contradiction.

These findings align with the geometric nature of the method.


๐ŸŒš Adversarial Calibration and the Search for a Second Dimension

1. Motivation

After establishing Hallucination Energy as a projection-based grounding signal, we sought to determine whether it could serve as a deterministic decision boundary under adversarial pressure.

The central question was not:

Can energy correlate with hallucination?

But rather:

Can energy define a stable, policy-calibrated acceptance boundary when negatives are deliberately constructed to be difficult?

This distinction shaped the next phase of experimentation.


2. Calibration Under a False Acceptance Budget

We introduced a calibration regime based on a fixed False Acceptance Rate (FAR).

Instead of hand-tuning thresholds, we:

  • Split the dataset into calibration and evaluation fractions.
  • Selected ฯ„ such that the calibration subset satisfied a target FAR (e.g., 2%).
  • Applied that ฯ„ to held-out data.

This transformed hallucination energy from a descriptive metric into an executable policy boundary.

Energy became:

  • Not just a score,
  • But a deterministic gate.

This was the moment the concept of policy-bounded AI became operational.


3. Adversarial Negative Construction

To test robustness, we introduced structured adversarial modes:

  • Deranged negatives semantically plausible but factually broken.
  • Hard-mined negatives explicitly selected to minimize energy separation.
  • Additional perturbation modes (offset, cyclic, permute, etc.).

These were not random corruptions.

They were designed to pressure-test the projection geometry.

The expectation was that if energy truly captured grounding, it would maintain separation under increasingly adversarial conditions.


4. Multi-Axis Policy Experiments

We did not rely solely on energy.

We experimented with multiple decision axes:

  • Energy-only
  • Projection ratio
  • Sensitivity (leave-one-out brittleness)
  • Monotone adaptive combinations

We varied:

  • gap width
  • calibration fraction
  • policy regime
  • embedding backbone
  • dataset domain

The implicit hypothesis was that a second orthogonal scalar might emerge a complementary geometric signal that, when combined with energy, would restore separability under adversarial pressure.


5. What We Observed

Several consistent patterns emerged:

  1. Hallucination energy provided strong containment signals on structured generative benchmarks.
  2. Under adversarial mining, energy separation narrowed significantly.
  3. No simple scalar axis restored robust separation.
  4. Policy tuning altered decision thresholds but did not introduce new geometric discrimination.

Most importantly:

Hard-mined negatives frequently remained within the same embedding subspace as supported claims.

This was not noise.

It was structure.


6. The Key Realization

Projection-based hallucination energy measures span containment.

It does not measure:

  • Logical inversion
  • Relational contradiction
  • Polarity reversal
  • Subtle semantic misalignment within the same manifold

Adversarial negatives often lie within the same semantic subspace as true claims.

When that occurs, projection residual alone cannot fully separate them.

This is not a failure of the metric.

It is a boundary of the representation.

โš–๏ธ Formal Boundary Statement

Let \( c \) and \( c' \) be claim embeddings such that:

$$ c, c' \in \text{span}(E) $$

Then:

$$ \mathcal{H}(c, E) \approx \mathcal{H}(c', E) $$

Projection residual cannot distinguish relational inversion when both claims lie within the same semantic subspace.

This is a representational limitation of linear containment geometry.


7. Interpretation

These experiments suggest:

  • Hallucination energy is a strong first-order geometric grounding signal.
  • It functions effectively as a policy-controllable scalar.
  • It provides deterministic gating under calibrated regimes.
  • However, projection residual is not sufficient for adversarially robust truth discrimination.

The second dimension we sought does not appear to be another scalar threshold.

It likely requires a different representational structure potentially relational, graph-based, or compositional.


8. Why This Matters

This phase of experimentation clarified the role of hallucination energy:

It is a containment measure, not a full truth model.

That distinction is crucial.

Containment can enforce policy. Containment can bound stochasticity. Containment can define confidence regimes.

But adversarial robustness requires additional structure.

This delineation strengthens the contribution rather than weakening it.


๐Ÿ“Š Empirical Results Across Adversarial Datasets

To stress-test projection geometry, we evaluated three distinct claimโ€“evidence regimes:

Dataset Domain Type Stress Characteristic
Wiki General knowledge Clean entity grounding
PubMed Biomedical verification Dense terminology + semantic overlap
CaseHold Legal reasoning Relational nuance + argument structure

Each dataset was evaluated under adversarial calibration with:

  • Fixed FAR calibration (ฯ„ selected on calibration split)
  • Hard-mined negatives
  • Deranged negatives
  • Multi-axis policy sweeps

๐Ÿ” Cross-Domain Stability Observation

Across Wikipedia and PubMed adversarial runs, mean separation remained approximately constant:

$$ \Delta \mu \approx 0.32 $$

This consistency across structurally distinct factual domains suggests that projection residual captures a domain-agnostic containment constant under evidence-based grounding.

This stability strengthens the claim that hallucination energy reflects geometric containment rather than dataset-specific heuristics.


๐Ÿ™‡ Adversarial Calibration and the Search for a Second Dimension

๐Ÿ”ญ 1. Motivation

After establishing Hallucination Energy as a projection-based grounding signal, we sought to determine whether it could function as a stable decision boundary under adversarial pressure.

The core question was not:

Does energy correlate with hallucination?

But rather:

Can projection residual define a calibrated, deterministic acceptance boundary when negatives are deliberately constructed to be difficult?

This distinction is critical. A correlational signal is descriptive. A calibrated boundary is operational.


๐Ÿ’ธ 2. Calibration Under a False Acceptance Budget

We introduced a fixed False Acceptance Rate (FAR) regime.

Rather than hand-tuning thresholds, we:

  • Split each dataset into calibration and evaluation subsets.
  • Selected a threshold ( \tau_{energy} ) such that the calibration subset satisfied a target FAR (e.g., 2%).
  • Applied that threshold to held-out evaluation data.

This procedure transformed hallucination energy from a descriptive metric into an executable policy boundary.

Energy was no longer merely a score. It became a deterministic gate.

This is the point at which policy-bounded AI becomes operational.


๐Ÿ’ฅ 3. Adversarial Negative Construction

To stress-test robustness, we introduced structured adversarial modes, focusing on hard_mined_v2 negatives:

  • Negatives were selected to minimize energy separation.
  • Semantic similarity was preserved.
  • Page mismatch and disjoint identifiers were enforced.

These were not random corruptions.

They were intentionally constructed to remain within the same embedding manifold as supported claims.

The goal was to determine whether projection geometry alone could maintain separability when containment cues were minimized.


๐Ÿ—ผ 4. Cross-Domain Results

We evaluated three structurally distinct datasets:

  • Wikipedia (FEVEROUS-style factual verification)
  • PubMed (biomedical evidence-based QA)
  • CaseHold (legal multiple-choice reasoning)

All runs used identical calibration methodology and adversarial construction.

Hard-Mined Results

Dataset ฯ„_energy TPR_eval FAR_eval Pos ฮผ Neg ฮผ ฮ”ฮผ Separation (ฯƒ)
Wiki 0.1468 0.067 0.006 0.3714 0.6950 +0.3236 +1.92
PubMed 0.4382 0.644 0.014 0.3700 0.6944 +0.3244 +2.11
CaseHold 0.3862 0.029 0.012 0.6680 0.6075 โˆ’0.0605 โˆ’0.40

Where:

  • ( \mu_{pos} ) = mean energy for supported claims
  • ( \mu_{neg} ) = mean energy for adversarial negatives
  • ( \Delta \mu = \mu_{neg} - \mu_{pos} )
  • Separation expressed in pooled standard deviations

๐Ÿ‘€ 5. Observations

5.1 Factual Domains: Stable Separation

Across both Wikipedia and PubMed, adversarial mining preserved approximately two-standard-deviation separation.

Notably:

  • Mean positive energy remained ~0.37
  • Mean negative energy remained ~0.69
  • ฮ”ฮผ โ‰ˆ 0.32 across both domains

This cross-domain consistency is significant. It indicates that projection residual captures a stable geometric containment signal in factual retrieval settings.

Even under adversarial selection, energy maintained meaningful separation.


CaseHold exhibited a different pattern:

  • Positive and negative means were close.
  • ฮ”ฮผ โ‰ˆ โˆ’0.06.
  • Separation โ‰ˆ โˆ’0.4ฯƒ.

This is not catastrophic failure. It is geometric collapse.

Both correct and incorrect legal holdings frequently lie within similar embedding subspaces.

Projection containment alone does not discriminate relational nuance in legal multiple-choice reasoning.


๐Ÿ“ 6. The Second Dimension Hypothesis

We hypothesized that a second orthogonal scalar might emerge:

  • Projection ratio
  • Sensitivity (leave-one-out brittleness)
  • Adaptive monotone combinations
  • Gap-width tuning

Extensive experimentation failed to identify a complementary scalar axis that restored separation in adversarial settings.

This is an important negative result.

The limitation was not threshold selection. The limitation was representational.

Hard-mined negatives often remain inside the same semantic subspace as true claims.

Projection residual measures containment. It does not measure relational contradiction, polarity inversion, or compositional misalignment.

    flowchart LR

    A[True Claim Vector] --> B[Evidence Subspace Span]

    C[Adversarial Claim Vector] --> B

    B --> D[Projection Residual<br/>Hallucination Energy]

    D --> E{Inside Span?}

    E -->|Yes| F[Low Energy<br/>Containment]
    E -->|No| G[High Energy<br/>Rejection]

    H[Relational Inversion] -.-> I[Not Captured<br/>by Projection Residual]

    classDef geom fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
    classDef warn fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#b71c1c

    class A,B,C,D,E,F,G geom
    class H,I warn
  

๐Ÿ“‰ 7. Interpretation

These experiments clarify the role of hallucination energy:

  • It is a first-order geometric containment metric.
  • It functions effectively as a policy-calibrated scalar.
  • It remains stable across factual retrieval domains.
  • It degrades in compositional reasoning tasks where semantic subspace overlap is high.

This is not a failure of the metric.

It is a boundary condition of projection geometry.


๐Ÿง 8. Why This Strengthens the Contribution

The adversarial phase revealed something more precise than a simple success/failure narrative:

Hallucination energy is not a universal truth discriminator.

It is a containment regulator.

Containment can:

  • Enforce policy,
  • Bound stochasticity,
  • Define calibrated acceptance regimes,
  • Provide deterministic gating under fixed FAR budgets.

Adversarial robustness in compositional domains likely requires additional representational structure (relational or graph-based augmentation).

This boundary does not weaken the contribution.

It defines it.

    flowchart TD

    A[๐Ÿ“š Claim + Evidence Dataset] --> B[โœ‚๏ธ Split<br/>Calibration / Evaluation]

    B --> C1[โš™๏ธ Calibration Set]
    B --> C2[๐Ÿงช Evaluation Set]

    C1 --> D[๐Ÿ“ Compute Hallucination Energy]
    D --> E[๐ŸŽฏ Select ฯ„_energy<br/>under FAR constraint]

    C2 --> F["๐Ÿšจ Adversarial Generator<br/>(hard_mined_v2)"]

    F --> G[๐Ÿ” Construct Difficult Negatives]

    C2 --> H["๐Ÿ“ Compute Energy (Positives)"]
    G --> I["๐Ÿ“ Compute Energy (Negatives)"]

    H --> J[๐Ÿงฑ Policy Gate<br/>energy โ‰ค ฯ„]
    I --> J

    J --> K[๐Ÿ“Š Separation Analysis]

    K --> L{Stable Separation?}

    L -->|Yes| M[๐ŸŸข Strong Containment Signal]
    L -->|No| N[๐Ÿ”ด Same Subspace<br/>Geometric Collapse]

    classDef dataset fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
    classDef calibration fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
    classDef adversarial fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#e65100
    classDef energy fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
    classDef policy fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#880e4f
    classDef result fill:#eeeeee,stroke:#424242,stroke-width:2px,color:#212121

    class A dataset
    class B dataset
    class C1,C2 calibration
    class D,E,H,I energy
    class F,G adversarial
    class J policy
    class K,L,M,N result
  

๐Ÿง‘ Summarization as a Real-World Application

Up to this point, our experiments explored hallucination energy as a calibrated containment signal under adversarial pressure.

Those tests were intentionally geometric and stress-oriented.

But the real question remained:

Does hallucination energy provide measurable signal in an actual generative application?

To answer this, we moved from synthetic adversarial construction to a real-world task:

Hallucination detection in summarization.


๐Ÿ’ช Why Summarization Is a Meaningful Test

Summarization is not adversarially constructed.

It is not synthetically perturbed.

It is a natural generative task where:

  • A model produces a summary
  • That summary may or may not be supported by the source document

This makes it ideal for evaluating whether hallucination energy captures grounding behavior in a realistic setting.

If the signal holds here, it is no longer just geometric containment under pressure.

It is application-level grounding.


๐Ÿ“ฒ Dataset: HaluEval (10,000 Samples)

We evaluated on 10,000 samples from the HaluEval dataset.

Execution log:

Loaded 10000 samples.
Completed 10000 samples in 10.26s
Extracted feature dataframe shape: (10000, 20)

This was a full pipeline run.


๐Ÿ”จ How the Summarization Pipeline Works

Unlike claimโ€“evidence pairs, summarization requires structured decomposition.

The process is:

  1. Split the generated summary into sentences.

  2. Split the source document into sentences.

  3. Embed both summary sentences and document sentences.

  4. For each summary sentence:

    • Compute projection geometry against the document sentence manifold.
    • Compute containment metrics.
    • Compute residual energy.
  5. Aggregate per-sentence features into document-level signals.

This produces 20 structured features per summary, including:

  • Mean and percentile energy
  • High-energy sentence counts
  • Coverage ratios
  • Similarity margins
  • Entailment signals
  • Structural features (sentence count, paragraph count)

This transforms summarization from a binary judgment into a geometric feature space.


๐Ÿ“‘ Experimental Results (10,000 Samples)

๐Ÿ“Ž Geometry-Only Model

AUC: 0.7120 95% CI: [0.6929, 0.7306]

Key signals:

  • high_energy_count: โˆ’0.5359
  • mean_energy: โˆ’0.1804
  • p90_energy: โˆ’0.2299
  • mean_coverage: +0.1704

Interpretation:

Energy-based geometric containment alone produces strong discrimination.


Entailment-Only Model

AUC: 0.6126

Entailment contributes signal, but less than projection geometry.


Full Model (Geometry + Entailment + Structure)

AUC: 0.7284 95% CI: [0.7108, 0.7455]

Notable coefficients:

  • mean_energy: โˆ’0.7438
  • high_energy_count: โˆ’0.4142
  • p90_energy: โˆ’0.3367

Energy features dominate the learned signal.


XGBoost (Nonlinear Model)

AUC: 0.7338 Cross-Validated AUC: 0.7515 ยฑ 0.0018

This is a statistically stable improvement.

The signal strengthens with model capacity.

Cross Validated AUC


๐Ÿ“‚ Ablation Study

We removed key energy features to test necessity.

Removed Features AUC
None (Full Model) 0.7284
energy_gap 0.7284
high_energy_count 0.7271
Both 0.7271

Energy-derived signals remain structurally embedded in the feature space.

The model does not collapse when individual energy components are removed.

This suggests:

  • Energy is not acting as a fragile single-axis heuristic.
  • It contributes distributed signal across geometry features.

๐Ÿฆ What This Proves

On 10,000 samples:

  • Geometry alone: ~0.71 AUC
  • Full model: ~0.73 AUC
  • Nonlinear model: ~0.75 CV AUC

Hallucination energy:

  • Scales
  • Generalizes
  • Improves with model capacity
  • Remains stable across 10k samples

๐Ÿชž Why This Is Different From the Adversarial Phase

In adversarial calibration:

  • We searched for a second scalar axis.
  • Separation narrowed under hard-mined pressure.
  • Projection residual hit representational limits.

In summarization:

  • We do not require adversarial separation.
  • We measure grounding quality across naturally generated text.
  • Energy functions as a strong, first-order grounding signal.

The distinction is critical.

Adversarial experiments exposed the limits of projection residual.

Summarization demonstrates its practical utility.


๐Ÿ’ˆ The Important Point

The evaluation runner:

  • Loads dataset
  • Executes full embedding pipeline
  • Extracts structured features
  • Runs logistic regression
  • Bootstraps AUC confidence intervals
  • Performs ablation
  • Trains XGBoost
  • Produces calibration and ROC curves
  • Writes reproducible artifacts

Hallucination energy is a measurable, statistically significant contributor.

The adversarial phase clarified the geometric boundary of projection-based containment. The summarization experiments demonstrated that within generative regimes, hallucination energy remains a strong, scalable grounding signal. We summarize the contributions of this work below.


๐Ÿ‘น Demonstration: High vs Low Hallucination Energy

To make the geometric intuition concrete, we present representative low-energy and high-energy examples drawn from the 10,000-sample HaluEval summarization evaluation.

For each case, we report:

  • Mean Hallucination Energy
  • Top-1 sentence similarity
  • Mean entailment score

These values come directly from the evaluation pipeline.


๐Ÿ’ฃ Low Energy Examples (Strong Containment)

These examples exhibit low projection residual, high similarity to supporting evidence, and strong containment behavior.

Case Energy (mean) Top-1 Similarity Mean Entailment Label
A 0.114 0.936 0.046 1
B 0.178 0.905 0.002 1
C 0.181 0.886 0.003 0

๐Ÿ‘น Example A (Energy = 0.114)

Britain’s first ever female Olympic boxer, Natasha Jonas, has announced her retirement from boxing, saying “it just felt like the right time to retire.” Despite losing to eventual gold medallist Katie Taylor at the 2012 London games, Jonas made history and paved the way for women’s boxing.

This example lies strongly within the embedding span of the evidence. The projection residual is small, and similarity is high. This is exactly what geometric containment predicts for grounded summaries.


๐Ÿ‘บ High Energy Examples (Weak Containment)

These examples exhibit large projection residual, weak alignment, and reduced containment.

Case Energy (mean) Top-1 Similarity Mean Entailment Label
D 0.881 0.298 0.080 0
E 0.896 0.267 0.002 0
F 0.903 0.222 0.024 1

Example D (Energy = 0.881)

The race for promotion from the Championship is set to go to the wire. The top four teams in the second tier are separated by just two points…

Here the projection residual is high. The summary content is poorly contained within the evidence subspace, and similarity collapses accordingly.


๐ŸงŒ What This Demonstrates

Across 10,000 samples:

  • Low energy correlates with high similarity and containment.
  • High energy corresponds to semantic drift.
  • Energy alone provides substantial separability (Geometry AUC โ‰ˆ 0.712).
  • Combined features reach AUC โ‰ˆ 0.728 (Logistic) and โ‰ˆ 0.751 (XGBoost CV).

These examples visually illustrate the geometric principle:

Hallucination Energy measures how far a claim vector lies outside the evidence span.

It is not a black-box confidence score. It is a projection residual.


๐Ÿงง Contributions

This work makes the following contributions:

  1. A projection-based hallucination energy metric We introduce hallucination energy as a geometric projection residual that measures containment of claims within evidence subspaces.

  2. A FAR-calibrated deterministic policy gate We operationalize hallucination energy as an executable decision boundary by calibrating thresholds under fixed False Acceptance Rate (FAR) constraints.

  3. Adversarial stress-testing across three domains We evaluate robustness using structured adversarial negatives across Wikipedia (FEVEROUS), biomedical (PubMedQA), and legal (CaseHOLD) datasets.

  4. Identification of projection-boundary limitations We demonstrate that projection residual captures span containment but does not fully resolve adversarially constructed semantic inversions within the same embedding manifold.

  5. Real-world summarization validation at scale We apply the framework to 10,000 HaluEval summarization samples, achieving up to 0.751 cross-validated AUC using energy-driven features.

  6. Empirical evidence that energy is a strong first-order grounding signal Across adversarial and generative settings, hallucination energy consistently provides measurable separation and supports policy-controllable AI behavior.


๐Ÿ“š References

Title Authors Year Publication / Source
Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMS Supratik Sarkar and Swagatam Das 2025 arXiv preprint arXiv:2508.19366v4
HARP: Hallucination Detection via Reasoning Subspace Projection Junjie Hu, Gang Tu, Sheng Yu Cheng, Jinxin Li, Jinting Wang, Rui Chen, Zhilong Zhou, and Dongbo Shan 2025 arXiv preprint arXiv:2509.11536v2
Policy-Aware Generative AI for Safe, Auditable Data Access Governance Shames Al Mandalawi, Muzakkiruddin Ahmed Mohammed, Hendrika Maclean, Mert Can Cakmak, and John R. Talburt 2025 arXiv preprint arXiv:2510.23474v1

๐ŸŽ’ BibTeX Entries

@article{sarkar2025grounding,
  title={Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in Multimodal LLMS},
  author={Sarkar, Supratik and Das, Swagatam},
  journal={arXiv preprint arXiv:2508.19366v4},
  year={2025}
}

@article{hu2025harp,
  title={HARP: Hallucination Detection via Reasoning Subspace Projection},
  author={Hu, Junjie and Tu, Gang and Cheng, Sheng Yu and Li, Jinxin and Wang, Jinting and Chen, Rui and Zhou, Zhilong and Shan, Dongbo},
  journal={arXiv preprint arXiv:2509.11536v2},
  year={2025}
}

@article{almandalawi2025policy,
  title={Policy-Aware Generative AI for Safe, Auditable Data Access Governance},
  author={Al Mandalawi, Shames and Mohammed, Muzakkiruddin Ahmed and Maclean, Hendrika and Cakmak, Mert Can and Talburt, John R.},
  journal={arXiv preprint arXiv:2510.23474v1},
  year={2025}
}