Delta Memory: Cargo-Culting Human Memory with Search

AI, Agents, Memory Systems, Cognitive Architectures

May 18, 2026

Delta Memory: Cargo-Culting Human Memory with Search

Page content

AI systems today have no idea why their own memory changes, that’s the problem we are trying to solve in this post.

Summary

Most AI memory systems start from a practical place: retrieval. Retrieval is useful, scalable, and often the right tool for the job. But if we want systems that interact with humans in more human‑like ways, we need a different analogy, not storage, but thinking.

Humans don’t store perfect records. We don’t retrieve exact text or replay video files. What we call “memory” is a shifting landscape of associations, impressions, weights, and patterns. When you recall something, you’re not pulling a file from disk, you’re running a search across your internal world, shaped by everything you’ve lived through.

If you and I both think about a traumatic event, the images that surface will be completely different. They’re shaped by our experiences, our histories, our fears, our movies, our books, our relationships. Human memory is contextual, associative, and deeply personal.

That’s the idea I wanted to capture in this system.

I wanted to give an AI something analogous to that:
a way to surface related items, weight them, connect them, and, most importantly, a way to measure how those connections change over time.

Because humans don’t just have memories.
We change because of them.

Think about a simple example:
You believe in a politician. One day they say something that breaks your trust. Suddenly your stance shifts. That shift didn’t come from nowhere, it came from a chain of influences, experiences, and signals that accumulated until your internal model updated.

We do this constantly.
Every day.
Without noticing.

But an AI?
When an AI suddenly changes its answer, preference, or reasoning path, we have no idea why. There’s no internal telemetry. No attribution. No explanation. No way to tell whether the change was healthy, harmful, or accidental.

Before we can give an AI the ability to change its mind, we need a way to measure that change.

That’s what this post is about.

It’s about building a system that:

represents memory as a weighted, contextual belief state
compares those states over time
attributes the change to specific sources
evaluates whether the change looks healthy or dangerous
and uses that signal to influence future search behavior

In other words:
a system that doesn’t just store information, it thinks with it, updates from it, and explains how it changed.

This is the missing layer between retrieval and reasoning.

Why “Cargo Cult” Memory?

A cargo cult copies the visible structure of something without having access to the underlying mechanism. That’s the spirit of this project: not to replicate human memory, but to reproduce the observable behaviors that make human memory useful.

Delta Memory is not a theory of consciousness or a model of the human brain. It is a runtime system engineered to mimic the functional properties we can actually observe in human recall:

associative activation
reinforcement
recency effects
salience weighting
source dominance
contradiction handling
reconstruction instead of replay
shifting accessibility over time

These are the patterns we see in human memory from the outside — the parts we can measure, reason about, and design for.

So the guiding question became:

What happens if we intentionally engineer those observable properties into a runtime system?

Delta Memory is my attempt to explore that question: a practical, deterministic layer that behaves like memory in the ways that matter for cognition, without pretending to be the real thing.

The Core Insight: Memory Is Not Storage

Traditional AI systems usually treat memory as stored state:

facts
documents
embeddings
conversation history

But human memory does not feel like retrieval.

It feels like search.

What we remember depends on:

what is emotionally salient
what was recently reinforced
what contradicts current beliefs
what is currently accessible
what other memories activate nearby associations

In other words:

memory = reachable information under current conditions

This explains why humans:

forget obvious things
become emotionally biased
reinforce beliefs over time
misremember details
drift ideologically
reconstruct memories differently depending on context

Those are not storage problems.

They are search-topology problems.

That realization completely changed how I thought about AI memory systems.

The Real Shift: Memory as State Transition

Traditional AI memory systems care about:

what is stored

Delta Memory cares about:

what changed

That distinction matters enormously.

The key object in the system is not merely the memory snapshot:

MemorySetDTO  # a weighted belief state at time T

The key object is the transition:

DeltaMemoryDiffDTO  # the measured change between T and T+1

That transition contains:

what changed
why it changed
which source caused the change
whether the change is healthy
whether the system should trust the change
whether future behavior should adapt

This is the difference between:

memory as storage

and:

memory as evolving cognition

The Core Contracts

Before looking at the runtime flow, it helps to see the two central contracts.

MemorySetDTO represents memory at a moment in time. It is not just a list of retrieved items. It includes candidates, source contributions, source reports, aggregate score, and dominance information.

DeltaMemoryDiffDTO represents the measured transition between two memory states. This is where memory becomes observable: the system can compare before and after, then ask what changed, which source caused it, and whether the change should be trusted.

class MemorySetDTO(BaseModel):
    memory_set_id: str
    goal: str
    query: str | None
    candidates: list[MemoryCandidateDTO]
    contributions: list[MemoryContributionDTO]
    source_reports: list[MemorySourceReportDTO]
    aggregate_score: float
    dominant_source: str | None
    dominance_ratio: float | None


class DeltaMemoryDiffDTO(BaseModel):
    before_memory_set_id: str
    after_memory_set_id: str
    changed_top_candidate: bool
    changed_dominant_source: bool
    aggregate_score_delta: float
    candidate_deltas: list[CandidateDeltaDTO]
    source_deltas: list[SourceDeltaDTO]
    attribution: DeltaMemoryAttributionDTO
    health: DeltaMemoryHealthDTO
    decision: DeltaMemoryChangeDecisionDTO

The important design choice is that memory and memory change are separate objects.

A MemorySetDTO answers: “What is active right now?”

A DeltaMemoryDiffDTO answers: “How did the active state change, and should that change affect future behavior?”

That separation is what makes the rest of the system measurable.

Stage 1 Composing Active Memory

The first stage builds a runtime memory snapshot from multiple weighted sources.

Inside Writer, these sources are configurable:

current runtime context
runtime.search
database memory
model priors
future memory providers

Each source contributes memory candidates into a composed belief state.

Building a Memory Set

The first implementation step is to compose a memory state from configurable sources.

In this example, four sources contribute to memory: current context, runtime search, database memory, and model prior. Each source has a weight. Those weights do not decide the answer directly; they decide how strongly each source can influence the active memory state.

config = DeltaMemoryConfigDTO(
    normalize_weights=True,
    sources=[
        MemorySourceConfigDTO(
            source_name="current_context",
            source_type="context_memory",
            weight=0.30,
        ),
        MemorySourceConfigDTO(
            source_name="runtime_search",
            source_type="search_memory",
            weight=0.25,
        ),
        MemorySourceConfigDTO(
            source_name="writer_database",
            source_type="database_memory",
            weight=0.25,
        ),
        MemorySourceConfigDTO(
            source_name="trained_model_prior",
            source_type="model_memory",
            weight=0.20,
        ),
    ],
)

memory_set = runtime.delta_memory.build_memory_set(
    goal="review artifact",
    query="lens runtime voice preservation",
    context={
        "context_memory_candidates": [
            {
                "text": "Current task is an artifact review.",
                "confidence": 0.90,
                "relevance": 0.85,
            }
        ],
        "search_hits": [
            {
                "hit_id": "search:1",
                "title": "Lens contribution report",
                "summary": "Voice preservation and semantic similarity results.",
                "score": 0.92,
            }
        ],
    },
    config=config,
)

print(memory_set.dominant_source)
print(memory_set.dominance_ratio)

This example produces a memory set for a specific goal and query.

Notice that context and search are both allowed to contribute. The current task says “this is an artifact review,” while search contributes a concrete result about lens reports and voice preservation. Delta Memory does not assume either source is automatically correct. It records both, weights both, and then measures which one dominates.

The engine:

normalizes weights across sources
collects raw candidates
filters by confidence and top‑k
computes weighted scores and aggregate confidence
records contribution metadata and provenance
identifies the dominant source and dominance ratio

The result is a MemorySetDTO: the system’s active memory state at a specific moment in time.

Not all memories are equal. Some sources dominate. Some weaken. Some reinforce one another. That is intentional. Human memory behaves similarly.

Diagram: Memory Composition

The diagram below shows the same process visually.

The important idea is that sources do not write directly into memory. They first produce raw candidates. Those candidates are then normalized, filtered, weighted, and assembled into a single memory set.

    flowchart TD
    A["🎯 Goal + Query"] --> B{⚙️ DeltaMemoryEngine}
    
    B --> C1["🗣️ Context Memory<br>weight=0.30"]
    B --> C2["🔍 Search Memory<br>weight=0.25"]
    B --> C3["💾 Database Memory<br>weight=0.25"]
    B --> C4["🧠 Model Prior<br>weight=0.20"]
    
    C1 --> D["📦 Raw Candidates"]
    C2 --> D
    C3 --> D
    C4 --> D
    
    D --> E["⚖️ Normalize & Filter<br>• confidence threshold<br>• top‑k per source"]
    
    E --> F["✅ MemorySetDTO<br>━━━━━━━━━━━━━━━━<br>📋 candidates (weighted)<br>📊 source reports<br>👑 dominant source<br>📈 aggregate score"]
    
    style A fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    style B fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style C1 fill:#f3e5f5,stroke:#4a148c
    style C2 fill:#e8f5e9,stroke:#1b5e20
    style C3 fill:#fff8e1,stroke:#f57f17
    style C4 fill:#ffebee,stroke:#b71c1c
    style F fill:#e0f2f1,stroke:#004d40,stroke-width:3px

This is the first point where memory becomes inspectable.

Instead of asking the model to “remember” something implicitly, the runtime can show exactly which sources contributed, which candidates survived filtering, and which source dominated the final state.

The Scoring Rule

At the center of Stage 1 is a deliberately simple scoring rule.

A candidate has its own relevance and confidence. The source also has a configured weight. The final weighted score combines both.

original_score = candidate.raw_score or (
    candidate.relevance * candidate.confidence
)

weighted_score = source_weight * original_score

Memory Sources Are Not Neutral

One of the most important design choices was making memory sources explicitly configurable and measurable.

Each source can be:

enabled or disabled
weighted differently
filtered independently
capped by top‑k
governed separately

For example, the default configuration looks like this:

sources:
  - source_name: current_context
    source_type: context_memory
    weight: 0.30

  - source_name: runtime_search
    source_type: search_memory
    weight: 0.25

  - source_name: writer_database
    source_type: database_memory
    weight: 0.25

  - source_name: trained_model_prior
    source_type: model_memory
    weight: 0.20

This matters because memory is not neutral.

Context can over-steer cognition.
Search can dominate recall.
Model priors can reinforce stale beliefs.

The system therefore measures dominance explicitly and emits warnings like memory_source_dominance_detected when a single source exceeds a safe threshold.

Stage 2 Measuring Memory Change

This is where the architecture becomes interesting.

Given two memory states:

MemorySet(before)
MemorySet(after)

the system computes:

diff = diff_service.diff_memory_sets(before=before_set, after=after_set)

The resulting DeltaMemoryDiffDTO measures:

candidate deltas (added, removed, strengthened, weakened, unchanged)
source deltas and dominance shifts
contribution-level provenance changes
aggregate score shifts
attribution of cause
health and risk evaluation
recommended governance action

Example:

print(diff.changed_dominant_source)          # True
print(diff.attribution.primary_cause_source) # "runtime_search"
print(diff.health.health_status)             # "suspicious"
print(diff.decision.action)                  # "dampen"

The important point is this:

The system can now reason about its own memory evolution.

That is fundamentally different from simple retrieval.

A Tiny Before/After Example

To make the delta concrete, imagine a routing task.

At first, the context suggests that generic repository search is probably enough. Then a search result appears suggesting that the artifact/lens path is a better fit. Delta Memory builds one memory set before that evidence and one after it.

before = runtime.delta_memory.build_memory_set(
    goal="route artifact review task",
    query="best runtime path",
    context={
        "context_memory_candidates": [
            {
                "text": "Generic repository search is probably enough.",
                "confidence": 0.90,
                "relevance": 0.90,
            }
        ]
    },
    config=config,
)

after = runtime.delta_memory.build_memory_set(
    goal="route artifact review task",
    query="best runtime path",
    context={
        "context_memory_candidates": [
            {
                "text": "Generic repository search is probably enough.",
                "confidence": 0.90,
                "relevance": 0.90,
            }
        ],
        "search_hits": [
            {
                "hit_id": "search:artifact-lens",
                "title": "Artifact + Lens runtime fit",
                "summary": "Artifact review should use lens contribution reports.",
                "score": 0.96,
            }
        ],
    },
    config=config,
)

from the other day i'm doing corridor comment at 8 just to be ready but I'm not coming into line yeah but that's the worst I got here this morning at 9 after)

print(diff.changed_dominant_source)
print(diff.attribution.primary_cause_source)
print(diff.health.health_status)
print(diff.decision.action)

Generated results

changed_dominant_source: True
primary_cause_source: runtime_search
health_status: suspicious
decision: dampen
recommended_weight_adjustments: {"runtime_search": -0.15}

The important point is not that the system changed its mind.

The important point is that the change is now measurable. We can inspect whether the top candidate changed, whether the dominant source changed, and whether runtime search was actually responsible for the shift.

Diagram: Measuring the Delta

The next diagram shows what happens after two memory states exist.

This is the heart of Delta Memory: the runtime does not just keep the newer memory set. It compares the before and after states, computes deltas, attributes the cause, evaluates risk, and recommends a governance action.

    flowchart LR
    subgraph BEFORE ["🕒 Before State"]
        B_Mem["MemorySetDTO<br>dominant: context<br>score: 0.62"]
    end

    subgraph AFTER ["🕓 After State"]
        A_Mem["MemorySetDTO<br>dominant: search<br>score: 0.80"]
    end

    B_Mem --> DIFF["🔁 DeltaMemoryDiffService"]
    A_Mem --> DIFF

    DIFF --> CAND["📌 Candidate Deltas<br>• added / removed<br>• strengthened / weakened"]
    DIFF --> SRC["📂 Source Deltas<br>• weighted score Δ<br>• dominance shift"]
    DIFF --> CONT["🔗 Contribution Deltas<br>• provenance changes"]

    CAND --> ATTR["🧠 Attribution Service<br>━━━━━━━━━━━━━━━━<br>primary_cause_source:<br>runtime_search (0.71)"]
    SRC --> ATTR
    CONT --> ATTR

    ATTR --> HEALTH["🩺 Health Evaluator<br>risk = 0.35*d + 0.30*v + 0.20*dr + 0.15*c<br>━━━━━━━━━━━━━━━━<br>→ suspicious"]
    ATTR --> DECIDE["⚖️ Decision Service<br>━━━━━━━━━━━━━━━━<br>→ dampen"]

    HEALTH --> OUTPUT["📄 DeltaMemoryDiffDTO"]
    DECIDE --> OUTPUT

    style BEFORE fill:#f3e5f5,stroke:#6a1b9a
    style AFTER fill:#e8f5e9,stroke:#2e7d32
    style DIFF fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    style ATTR fill:#d1c4e9,stroke:#4527a0
    style HEALTH fill:#ffccbc,stroke:#bf360c
    style DECIDE fill:#b2dfdb,stroke:#004d40
    style OUTPUT fill:#c8e6c9,stroke:#1b5e20,stroke-width:3px

The diagram is intentionally mechanical.

There is no hidden model interpretation here. The system compares structured objects, measures score changes, computes attribution, evaluates health, and emits a decision. That makes memory change replayable instead of mysterious.

Attribution: What Changed The Memory?

One of the strongest parts of the system is deterministic attribution.

The DeltaMemoryAttributionService computes source influence by summing absolute weighted-score deltas across all candidates and contributions, then normalizing by total change. The source with the highest normalized influence becomes the primary_cause_source.

For example:

primary_cause_source = "runtime_search"
source_influence = {
    "runtime_search": 0.71,
    "current_context": 0.19,
    "model_prior": 0.10,
}

This allows the runtime to answer questions like:

Did search distort memory?
Is current context over-dominating?
Did recent evidence outweigh historical belief?
Is one source poisoning cognition?

Most AI systems cannot answer those questions. Delta Memory can. No black-box attention maps. Just rule-based, replayable attribution.

Attribution as Accounting

The attribution step is intentionally simple.

Instead of trying to infer causality from hidden model activations, the system treats memory change as an accounting problem. It sums how much each source changed the weighted score, normalizes those changes, and reports the largest contributor.

def attribute_source_change(source_deltas, contribution_deltas):
    totals = defaultdict(float)

    for source_delta in source_deltas:
        totals[source_delta.source_name] += abs(
            source_delta.weighted_score_delta
        )

    for contribution_delta in contribution_deltas:
        totals[contribution_delta.source_name] += abs(
            contribution_delta.weighted_score_delta
        )

    total_change = sum(totals.values())
    if total_change == 0:
        return {}

    return {
        source_name: change / total_change
        for source_name, change in totals.items()
    }

This is not a perfect theory of causality, but it is useful engineering.

If runtime_search accounts for 71% of the weighted-score change, the system can say that search was the primary cause of the memory shift. That gives us something concrete to inspect, dampen, reject, or reinforce later.

Health Evaluation and Cognitive Drift

The system does not assume every memory change is good. It explicitly evaluates memory health.

The DeltaMemoryHealthEvaluator computes:

drift_score: magnitude of aggregate score change
dominance_score: how heavily one source controls the set
volatility_score: proportion of candidates that shifted
contradiction_score: presence of conflicting evidence
confidence_score: average confidence of the new state
risk_score: weighted combination

The risk formula is deterministic:

Health Is Separate From Attribution

Attribution tells us what caused the memory change.

Health evaluation asks a different question: was the change safe?

A source can be the primary cause of a change and still be healthy. Or it can dominate too strongly, create volatility, and trigger a suspicious or dangerous classification.

risk_score = (
    0.35 * dominance_score +
    0.30 * volatility_score +
    0.20 * drift_score +
    0.15 * contradiction_score
)

Transitions are classified as:

healthy    (risk < 0.35)
suspicious (0.35 ≤ risk < 0.70)
dangerous  (risk ≥ 0.70)

These thresholds are not meant to be universal truths.

They are runtime policy. The point is not that 0.35 is magic. The point is that memory drift becomes measurable enough to govern. Different systems can tune these weights based on their risk tolerance.

This was heavily inspired by observing how humans drift cognitively. People become dominated by emotional recency, ideological reinforcement, authority bias, or repeated exposure. Delta Memory attempts to model these risks explicitly.

Warnings like memory_source_dominance_detected, high_memory_volatility, or source_dominance_changed surface before downstream behavior corrupts.

Governance Decisions

Once the system can evaluate memory change, it can govern it.

Every memory diff produces a recommendation via DeltaMemoryDecisionService:

accept
dampen
reject
investigate

From Measurement to Action

Once memory change has attribution and health, the runtime can recommend what to do next.

This is the beginning of cognitive governance: not just observing that memory changed, but deciding whether to accept the change, dampen it, reject it, or investigate it.

decision.action == "dampen"

This is the first place the system begins to regulate its own memory.

A suspicious change does not have to be accepted blindly. The runtime can reduce the influence of the source that caused the shift and then recompose memory to see whether the belief state stabilizes.

This might occur when:

runtime.search suddenly dominates memory
volatility spikes
contradictory evidence appears
current context overwhelms longer-term memory

The decision engine also emits concrete next steps:

recommended_weight_adjustments = {"runtime_search": -0.15}
recommended_followup_checks = ["recompose_memory_after_weight_adjustment"]

This creates a recursive loop:

memory → evaluation → governance → future memory composition

That loop is the beginning of runtime cognitive regulation.

Observable Cognition

This was the genuinely surprising part.

The system became introspectable.

Most AI systems cannot answer:

Why did memory change?
Which source caused the shift?
Is the system drifting?
Which source dominates?
Is the current conversation over-steering reasoning?
Should the change be trusted?

Delta Memory can.

Every memory transition becomes:

measurable
attributable
replayable
governable

This creates something that begins to resemble runtime epistemics rather than prompt engineering.

Why This Matters Now

LLMs are hitting the limits of:

longer context windows
bigger models
naive retrieval
stateless agents

The next frontier is runtime cognition — systems that:

maintain internal state
evolve over time
detect drift
attribute belief changes
regulate their own memory
bias their own search
explain their own transitions

Delta Memory is one architecture for that. It doesn’t replace retrieval, it sits above it as a cognitive layer.

The Trade-off

Most RAG systems optimize for recall (finding the right chunk). Delta Memory optimizes for epistemic stability (knowing why you believe what you believe).

In production, this means:

Fewer Hallucinations via Drift: By detecting when a single source dominates, we prevent the model from latching onto noisy data.
Audit Trails: When an agent gives a wrong answer, we can trace exactly which memory delta caused the shift.
Self-Correction: The system can dampen its own biases before they corrupt the final output.

It’s not just about remembering more; it’s about remembering better.

Search as the Cognitive Substrate

Originally, Writer’s search domain was just infrastructure. A runtime registry. A way to search files, process runs, voice restoration artifacts, workflows, and runtime state.

But over time the search engine accumulated:

recency weighting
runtime specialization
capability awareness
behavioral actions
memory references
cross-runtime aggregation

At some point, it stopped behaving like retrieval infrastructure and started behaving like associative recall.

Writer’s search is built around a registry of SearchableRuntimeProtocol adapters. Each runtime exposes:

search_capability()
search(request)
get_recent_memory()
get_actions()

The engine fans out, deduplicates, ranks with recency boosts, merges evidence, and attaches actionable next steps.

Crucially, memory itself becomes searchable. We expose composed memory sets via DeltaMemorySearchAdapter, allowing them to appear as hits in runtime.search.

But there is a second, more powerful loop: Memory guiding search.

Loop	Direction	Mechanism	Purpose
Recall	Search → Memory	`DeltaMemorySearchAdapter`	Let the agent query its own past beliefs.
Guidance	Memory → Search	`SearchMemoryBridge`	Let current beliefs bias where the agent looks next.

Together, they form a recursive cognitive cycle.

Stage 3 Memory-Guided Search

Once memory transitions become measurable, they can influence future search.

That is where SearchBiasDTO comes in.

The SearchMemoryBridge consumes a MemorySetDTO and/or DeltaMemoryDiffDTO and produces bias instructions that tell runtime.search:

which runtimes to prefer or suppress
which query terms to expand or block
how deep to search
whether to stop (if memory has stabilized)
whether to rerun (if health is dangerous or action is investigate)

Example:

Search as a Runtime Interface

For memory to behave like associative recall, search needs to be broader than document retrieval.

In Writer, search is exposed as a runtime interface. Files, process runs, voice restoration artifacts, and memory itself can all become searchable runtimes.

bias = bridge.build_search_bias(request)

bias.preferred_runtimes   # ["artifact", "lens"]
bias.suppressed_runtimes  # ["repo"]
bias.query_expansions     # ["artifact review", "voice preservation"]
bias.depth_hint           # "normal"
bias.should_rerun         # False
bias.should_stop          # False

This matters because each runtime is a different kind of memory surface.

A filesystem search is not the same as a process-run search. A voice restoration artifact is not the same as a model prior. By exposing them through the same interface, the system can search across different forms of experience.

Importantly, search internals remain untouched. The bridge emits structured bias hints, not rewritten retrieval logic. Any search consumer can respect or ignore them.

This means memory changes behavior. Behavior produces new evidence. New evidence changes memory.

That recursive loop is the real architecture.

Memory-Guided Search

Once the system can measure memory change, it can use that change to guide future search.

SearchBiasDTO is the bridge. It does not rewrite search internals. It simply describes how memory thinks the next search should be shaped.

class SearchBiasDTO(BaseModel):
    preferred_runtimes: list[str]
    suppressed_runtimes: list[str]
    query_expansions: list[str]
    query_suppressions: list[str]
    depth_hint: Literal["shallow", "normal", "deep"]
    should_rerun: bool
    should_stop: bool
    reason: str

bias = runtime.delta_memory.build_search_bias(
    goal="review artifact",
    memory_set=after,
    memory_diff=diff,
)

search_request = RuntimeSearchRequestDTO(
    query="voice preservation report",
    runtimes=bias.preferred_runtimes or None,
    exclude_runtimes=bias.suppressed_runtimes,
    depth=bias.depth_hint,
)

This creates a clean separation.

Search remains a general runtime service. Delta Memory produces bias hints: prefer these runtimes, suppress those runtimes, expand these terms, search deeper, rerun, or stop. Any search implementation can choose how much of that guidance to honor.

Diagram: The Recursive Memory/Search Loop

The final diagram shows the complete loop.

A query produces search results. Search results change the memory state. The memory diff produces bias. That bias changes the next search. The loop continues until the memory state stabilizes or the system decides it has enough evidence.

    flowchart TD
    START["🔍 User Query"] --> SEARCH["🌐 runtime.search<br>━━━━━━━━━━━━━━━━<br>• filesystem<br>• process runs<br>• voice restoration<br>• delta_memory (adapter)"]

    SEARCH --> MEM1["📝 Memory Composition<br>→ MemorySetDTO (after)"]

    MEM1 --> DIFF2["📊 Diff: before ↔ after<br>→ DeltaMemoryDiffDTO"]

    DIFF2 --> BIAS["🎯 SearchMemoryBridge<br>→ SearchBiasDTO<br>━━━━━━━━━━━━━━━━<br>preferred_runtimes: [artifact,lens]<br>suppressed: [repo]<br>depth: deep<br>should_rerun: true"]

    BIAS --> SEARCH2["🌐 runtime.search<br>(biased)"]
    SEARCH2 --> MEM2["📝 Memory Composition<br>(new)"]
    MEM2 --> STABLE{"🧘 Memory stable?"}
    
    STABLE -->|No| DIFF2
    STABLE -->|Yes| ANSWER["✅ Final Answer"]

    style START fill:#e3f2fd,stroke:#0d47a1
    style SEARCH fill:#fff3e0,stroke:#e65100
    style MEM1 fill:#f1f8e9,stroke:#33691e
    style DIFF2 fill:#fff9c4,stroke:#f57f17
    style BIAS fill:#e8eaf6,stroke:#283593
    style SEARCH2 fill:#ffe0b2,stroke:#bf360c
    style STABLE fill:#f3e5f5,stroke:#6a1b9a
    style ANSWER fill:#c8e6c9,stroke:#1b5e20,stroke-width:3px

This is the reason Delta Memory is more than a storage layer.

The system is not only remembering. It is using measured memory change to decide where to look next. That is the cargo-cult version of associative recall: imperfect, mechanical, but useful.

The Full Cognitive Loop

Search + Context + Model Priors + Database
                    ↓
             Memory Composition
                    ↓
               MemorySetDTO
                    ↓
              Memory Diffing
                    ↓
           DeltaMemoryDiffDTO
                    ↓
 Attribution + Health Evaluation
                    ↓
             Governance Decision
                    ↓
      Search Bias / Weight Adjustment
                    ↓
               Future Search
                    ↓
              (loop repeats)

This is no longer just retrieval. It is a runtime cognitive state machine.

Why This Beats Fine-Tuning

Most attempts at AI memory focus on:

larger context windows
more embeddings
more parameters
more training

Delta Memory takes the opposite approach.

Everything here is:

runtime-only
deterministic
observable
replayable
governable

No transformer surgery. No unstable fine-tuning. No opaque weight updates. Every memory transition can be inspected, replayed, and explained.

That is enormously important if you want long-running agents that remain debuggable, safe, and adaptable. And because the signals are clean and structured, they form the exact telemetry needed later for lightweight MRQ/DPO training without destabilizing the runtime.

The Recursive Cognitive Loop

The diagram below shows the complete cycle.
It starts with a search, builds memory, measures the change, attributes it, evaluates health, decides what to do, and biases the next search. Then the loop repeats.

This is not a linear pipeline. It is a closed loop.
Memory changes search. Search changes memory.

The arrows point forward, but the system feeds back into itself.

    flowchart LR
    START(["🚀 Start"]) --> SEARCH

    SEARCH["🔍 Search<br>━━━━━━━━━━<br>runtime.search"]
    SEARCH --> MEMORY["📝 Memory<br>━━━━━━━━━━<br>MemorySetDTO"]
    MEMORY --> DELTA["📊 Delta<br>━━━━━━━━━━<br>DeltaMemoryDiffDTO"]
    DELTA --> ATTRIB["🧠 Attribution<br>━━━━━━━━━━<br>primary cause"]
    ATTRIB --> HEALTH["🩺 Health<br>━━━━━━━━━━<br>risk & warnings"]
    HEALTH --> DECISION["⚖️ Decision<br>━━━━━━━━━━<br>accept/dampen/reject"]
    DECISION --> BIAS["🎯 Search Bias<br>━━━━━━━━━━<br>preferred runtimes<br>query expansions"]
    BIAS -->|feeds back to| SEARCH

    SEARCH --> LOOP_END(["🔄 Loop again"])

    style SEARCH fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
    style MEMORY fill:#f1f8e9,stroke:#33691e,stroke-width:2px
    style DELTA fill:#fff9c4,stroke:#f57f17,stroke-width:2px
    style ATTRIB fill:#d1c4e9,stroke:#4527a0,stroke-width:2px
    style HEALTH fill:#ffccbc,stroke:#bf360c,stroke-width:2px
    style DECISION fill:#b2dfdb,stroke:#004d40,stroke-width:2px
    style BIAS fill:#ffe0b2,stroke:#e65100,stroke-width:2px

What makes this useful is that every step is inspectable. You can see why search returned certain results, how memory changed, which source caused the shift, whether the change was risky, and what decision the system made. No hidden state. No mysterious model internals.

That is the difference between a system that merely stores information and one that can regulate its own cognition.

So What?

This architecture does something most AI memory systems cannot: it makes memory change visible.

You can now ask:

Why did the agent change its mind?
Which source (context, search, database, model prior) caused the shift?
Is the agent drifting toward dangerous beliefs?
Should we trust this new memory state?

And you get answers. Not guesses. Not attention maps. Just deterministic, replayable measurements.

If you are building long‑running agents, this matters. Because a system that cannot explain why its memory changed will eventually change in ways you cannot control. Delta Memory is an attempt to keep that door open.

Building Your Own Delta Memory (In 5 Steps)

Define MemorySetDTO – a snapshot with candidates, sources, scores, and dominance.
Implement deterministic diff – match candidates by ID or stable text hash, classify changes.
Add attribution – sum absolute score changes per source.
Add health rules – risk = weighted sum of dominance + volatility + drift.
Create a search bridge – translate memory deltas into preferred runtimes, query expansions, and depth hints.

Start with in‑memory SQLite. Keep everything runtime‑only. You’ll be surprised how far deterministic rules can take you.

The Bigger Picture

We no longer believe memory is fundamentally a storage problem.

We think it is a search problem.

More specifically:

memory is the evolving topology of associative search

Delta Memory is an attempt to model that topology explicitly:

compose it
measure it
attribute it
evaluate it
govern it
and eventually allow the system to tune itself safely over time

It may be one small step toward systems that maintain something more important than static context:

an evolving cognitive state.

That might be where real memory starts.

Appendix A: A Minimal SQLite Delta Memory Demo

Below is a tiny, self-contained example of Delta Memory using only Python and SQLite. It demonstrates the whole loop:

memory sources → memory set A → memory set B → diff → attribution → health → decision

import sqlite3
from dataclasses import dataclass
from collections import defaultdict
from uuid import uuid4


# -----------------------------
# Data structures
# -----------------------------

@dataclass
class Candidate:
    source: str
    text: str
    confidence: float
    relevance: float

    @property
    def score(self) -> float:
        return self.confidence * self.relevance


@dataclass
class MemoryItem:
    source: str
    text: str
    weighted_score: float


# -----------------------------
# SQLite setup
# -----------------------------

def init_db(conn):
    conn.execute("""
        CREATE TABLE IF NOT EXISTS memory_sets (
            id TEXT PRIMARY KEY,
            label TEXT,
            dominant_source TEXT,
            aggregate_score REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS memory_items (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            memory_set_id TEXT,
            source TEXT,
            text TEXT,
            weighted_score REAL
        )
    """)

    conn.commit()


# -----------------------------
# Stage 1: Compose memory
# -----------------------------

SOURCE_WEIGHTS = {
    "context": 0.30,
    "search": 0.25,
    "database": 0.25,
    "model_prior": 0.20,
}


def build_memory_set(conn, label: str, candidates: list[Candidate]) -> str:
    memory_set_id = str(uuid4())

    items = []
    source_totals = defaultdict(float)

    for candidate in candidates:
        source_weight = SOURCE_WEIGHTS[candidate.source]
        weighted_score = source_weight * candidate.score

        item = MemoryItem(
            source=candidate.source,
            text=candidate.text,
            weighted_score=weighted_score,
        )

        items.append(item)
        source_totals[candidate.source] += weighted_score

    aggregate_score = sum(item.weighted_score for item in items)

    dominant_source = None
    if source_totals:
        dominant_source = max(source_totals.items(), key=lambda x: x[1])[0]

    conn.execute(
        """
        INSERT INTO memory_sets (id, label, dominant_source, aggregate_score)
        VALUES (?, ?, ?, ?)
        """,
        (memory_set_id, label, dominant_source, aggregate_score),
    )

    for item in items:
        conn.execute(
            """
            INSERT INTO memory_items
                (memory_set_id, source, text, weighted_score)
            VALUES (?, ?, ?, ?)
            """,
            (memory_set_id, item.source, item.text, item.weighted_score),
        )

    conn.commit()
    return memory_set_id


def load_memory_items(conn, memory_set_id: str) -> list[MemoryItem]:
    rows = conn.execute(
        """
        SELECT source, text, weighted_score
        FROM memory_items
        WHERE memory_set_id = ?
        """,
        (memory_set_id,),
    ).fetchall()

    return [
        MemoryItem(source=row[0], text=row[1], weighted_score=row[2])
        for row in rows
    ]


def load_memory_set(conn, memory_set_id: str):
    return conn.execute(
        """
        SELECT id, label, dominant_source, aggregate_score
        FROM memory_sets
        WHERE id = ?
        """,
        (memory_set_id,),
    ).fetchone()


# -----------------------------
# Stage 2: Diff memory
# -----------------------------

def diff_memory_sets(conn, before_id: str, after_id: str) -> dict:
    before = load_memory_set(conn, before_id)
    after = load_memory_set(conn, after_id)

    before_items = load_memory_items(conn, before_id)
    after_items = load_memory_items(conn, after_id)

    before_by_text = {item.text: item for item in before_items}
    after_by_text = {item.text: item for item in after_items}

    candidate_deltas = []
    source_delta_totals = defaultdict(float)

    all_texts = set(before_by_text) | set(after_by_text)

    for text in sorted(all_texts):
        before_item = before_by_text.get(text)
        after_item = after_by_text.get(text)

        before_score = before_item.weighted_score if before_item else 0.0
        after_score = after_item.weighted_score if after_item else 0.0
        delta = after_score - before_score

        source = after_item.source if after_item else before_item.source

        if before_item is None:
            change_type = "added"
        elif after_item is None:
            change_type = "removed"
        elif delta > 0:
            change_type = "strengthened"
        elif delta < 0:
            change_type = "weakened"
        else:
            change_type = "unchanged"

        candidate_deltas.append({
            "text": text,
            "source": source,
            "before_score": before_score,
            "after_score": after_score,
            "delta": delta,
            "change_type": change_type,
        })

        source_delta_totals[source] += abs(delta)

    total_change = sum(source_delta_totals.values())

    source_influence = {
        source: change / total_change
        for source, change in source_delta_totals.items()
    } if total_change else {}

    primary_cause_source = None
    if source_influence:
        primary_cause_source = max(source_influence.items(), key=lambda x: x[1])[0]

    aggregate_score_delta = after[3] - before[3]
    changed_dominant_source = before[2] != after[2]

    health = evaluate_health(
        after_dominant_source=after[2],
        after_items=after_items,
        aggregate_score_delta=aggregate_score_delta,
        candidate_deltas=candidate_deltas,
    )

    decision = decide(primary_cause_source, health, changed_dominant_source)

    return {
        "before_label": before[1],
        "after_label": after[1],
        "before_dominant_source": before[2],
        "after_dominant_source": after[2],
        "changed_dominant_source": changed_dominant_source,
        "aggregate_score_delta": aggregate_score_delta,
        "candidate_deltas": candidate_deltas,
        "source_influence": source_influence,
        "primary_cause_source": primary_cause_source,
        "health": health,
        "decision": decision,
    }


# -----------------------------
# Stage 3: Health + decision
# -----------------------------

def evaluate_health(
    after_dominant_source: str,
    after_items: list[MemoryItem],
    aggregate_score_delta: float,
    candidate_deltas: list[dict],
) -> dict:
    total_score = sum(item.weighted_score for item in after_items)

    source_totals = defaultdict(float)
    for item in after_items:
        source_totals[item.source] += item.weighted_score

    dominance_score = 0.0
    if total_score > 0 and after_dominant_source:
        dominance_score = source_totals[after_dominant_source] / total_score

    changed_items = [
        item for item in candidate_deltas
        if item["change_type"] != "unchanged"
    ]

    volatility_score = len(changed_items) / max(1, len(candidate_deltas))
    drift_score = abs(aggregate_score_delta)

    risk_score = (
        0.35 * dominance_score +
        0.30 * volatility_score +
        0.20 * drift_score
    )

    if risk_score < 0.35:
        status = "healthy"
    elif risk_score < 0.70:
        status = "suspicious"
    else:
        status = "dangerous"

    return {
        "dominance_score": round(dominance_score, 3),
        "volatility_score": round(volatility_score, 3),
        "drift_score": round(drift_score, 3),
        "risk_score": round(risk_score, 3),
        "status": status,
    }


def decide(primary_cause_source: str | None, health: dict, changed_dominant_source: bool) -> dict:
    if health["status"] == "healthy":
        return {
            "action": "accept",
            "reason": "memory change appears stable",
        }

    if health["status"] == "dangerous":
        return {
            "action": "reject",
            "reason": "memory drift risk is too high",
            "source_to_review": primary_cause_source,
        }

    if changed_dominant_source:
        return {
            "action": "investigate",
            "reason": "dominant memory source changed",
            "source_to_review": primary_cause_source,
        }

    return {
        "action": "dampen",
        "reason": "memory shift is suspicious but not dangerous",
        "recommended_weight_adjustment": {
            primary_cause_source: -0.15
        } if primary_cause_source else {},
    }


# -----------------------------
# Demo
# -----------------------------

def main():
    conn = sqlite3.connect(":memory:")
    init_db(conn)

    before_candidates = [
        Candidate(
            source="context",
            text="Generic repository search is probably enough.",
            confidence=0.90,
            relevance=0.90,
        ),
        Candidate(
            source="model_prior",
            text="Use broad search first when routing is uncertain.",
            confidence=0.70,
            relevance=0.70,
        ),
    ]

    after_candidates = [
        Candidate(
            source="context",
            text="Generic repository search is probably enough.",
            confidence=0.90,
            relevance=0.90,
        ),
        Candidate(
            source="search",
            text="Artifact review should use Lens contribution reports.",
            confidence=0.96,
            relevance=0.96,
        ),
        Candidate(
            source="search",
            text="Voice preservation and semantic similarity are Lens signals.",
            confidence=0.90,
            relevance=0.88,
        ),
        Candidate(
            source="model_prior",
            text="Use broad search first when routing is uncertain.",
            confidence=0.70,
            relevance=0.70,
        ),
    ]

    before_id = build_memory_set(conn, "before", before_candidates)
    after_id = build_memory_set(conn, "after", after_candidates)

    diff = diff_memory_sets(conn, before_id, after_id)

    print("\n=== Delta Memory Diff ===")
    print(f"before dominant source: {diff['before_dominant_source']}")
    print(f"after dominant source:  {diff['after_dominant_source']}")
    print(f"changed dominant:       {diff['changed_dominant_source']}")
    print(f"aggregate delta:        {diff['aggregate_score_delta']:.3f}")
    print(f"primary cause:          {diff['primary_cause_source']}")

    print("\n=== Source Influence ===")
    for source, influence in diff["source_influence"].items():
        print(f"{source}: {influence:.2f}")

    print("\n=== Health ===")
    for key, value in diff["health"].items():
        print(f"{key}: {value}")

    print("\n=== Decision ===")
    for key, value in diff["decision"].items():
        print(f"{key}: {value}")

    print("\n=== Candidate Deltas ===")
    for item in diff["candidate_deltas"]:
        print(
            f"{item['change_type']:12} "
            f"{item['source']:10} "
            f"{item['delta']:+.3f}  "
            f"{item['text']}"
        )


if __name__ == "__main__":
    main()

Expected output:

=== Delta Memory Diff ===
before dominant source: context
after dominant source:  search
changed dominant:       True
aggregate delta:        0.409
primary cause:          search

=== Source Influence ===
context: 0.00
model_prior: 0.00
search: 1.00

=== Health ===
dominance_score: 0.705
volatility_score: 0.5
drift_score: 0.409
risk_score: 0.479
status: suspicious

=== Decision ===
action: investigate
reason: dominant memory source changed
source_to_review: search

=== Candidate Deltas ===
added        search     +0.230  Artifact review should use Lens contribution reports.
added        search     +0.198  Voice preservation and semantic similarity are Lens signals.
unchanged    context    +0.000  Generic repository search is probably enough.
unchanged    model_prior +0.000 Use broad search first when routing is uncertain.

This small example shows the entire idea:

A first memory state is built from context and model prior.
A second memory state adds search evidence.
The dominant source changes from context to search.
The system attributes the change to search.
The health evaluator marks the shift as suspicious.
The decision layer recommends investigation instead of blindly accepting the change.

Glossary

Term	Definition
Delta Memory	A runtime memory approach that measures how an active memory state changes over time, attributes the change to sources, and decides whether the change should be trusted.
MemorySetDTO	A snapshot of active memory: weighted candidates, source reports, contributions, aggregate score, dominant source, and provenance.
DeltaMemoryDiffDTO	A measured transition between two memory sets, showing what changed, why it changed, who caused it, and whether the change is healthy.
Memory Candidate	A possible memory item surfaced from context, search, database records, model priors, or another source.
Memory Contribution	The recorded effect of a candidate on the final memory set, including source weight, original score, weighted score, and provenance.
Memory Source	Any configurable provider of memory candidates, such as current context, runtime search, database memory, or model prior.
Source Weight	The influence assigned to a memory source before candidates are combined into a memory set.
Dominant Source	The memory source contributing the largest share of the final weighted memory state.
Dominance Ratio	The proportion of the total memory score controlled by the dominant source. High dominance may indicate over-steering.
Memory Drift	A meaningful shift in the active memory state over time. Drift can be healthy, suspicious, or dangerous.
Candidate Delta	The change in a memory candidate between two memory states: added, removed, strengthened, weakened, or unchanged.
Source Delta	The change in influence of a memory source between two memory states.
Contribution Delta	The change in a specific source/candidate contribution between two memory states.
Attribution	The process of identifying which source or candidate most contributed to a memory change.
Primary Cause Source	The source with the highest normalized influence over a memory change.
Health Evaluation	Runtime analysis that scores memory change using drift, dominance, volatility, contradiction, confidence, and risk.
Risk Score	A combined score estimating whether a memory change is safe, suspicious, or dangerous.
Governance Decision	The recommended action after evaluating a memory change: accept, dampen, reject, or investigate.
Dampen	Reduce the influence of a source when it appears to be over-steering memory.
Search as Cognition	The idea that memory behaves less like static storage and more like an evolving search process through available experience.
SearchBiasDTO	A proposed object that translates memory changes into search guidance, such as preferred runtimes, suppressed runtimes, query expansions, and depth hints.
SearchMemoryBridge	A proposed bridge that lets memory diffs influence future search behavior without rewriting search internals.
DeltaMemorySearchAdapter	An adapter that exposes composed memory as a searchable runtime, allowing memory itself to appear in search results.
Observable Cognition	A design goal where memory changes are measurable, attributable, replayable, and governable rather than hidden inside model behavior.
Cargo-Cult Memory	An intentionally humble framing: copying observable properties of human memory, such as association, salience, drift, and reinforcement, without claiming human-like consciousness.

References & Initial Inspiration

This post was originally inspired by two separate discussions around AI memory, retrieval, and associative runtime state. While the architecture described here evolved far beyond the original discussions, these threads helped crystallize the initial direction.

Hacker News Discussions

Hacker News, δ-mem: Efficient Online Memory for Large Language Models Discussion around compact online associative memory for LLMs, runtime memory state, retrieval limitations, multi-layer memory systems, and efficient long-context reasoning. Several comments explored the idea that memory may behave more like layered associative search than static replay. (Hacker News)
δ-mem Paper (ArXiv) Introduces a lightweight online associative memory mechanism that augments a frozen transformer with a compact runtime memory state. The paper proposes updating a small online memory matrix during inference and using it to influence future attention computation. (arXiv)
δ-mem GitHub Discussion Reference Reference implementation and community discussion surrounding the paper. (Reddit)

Key Ideas That Influenced This Post

Runtime Memory vs Static Context

The δ-mem paper argues that simply increasing context windows is inefficient and often ineffective for true long-term reasoning. Instead, it proposes maintaining a compact evolving associative memory state during inference. (arXiv)

That directly influenced the idea explored in this post:

memory is not only stored context
memory is evolving runtime state

Multi-Layer Memory

One of the most interesting Hacker News comments proposed that useful AI memory likely requires multiple layers:

short-term workspace memory
active task memory
searchable historical memory
compressed long-term memory

rather than a single retrieval mechanism. (Hacker News)

That idea heavily influenced the layered MemorySource architecture used in Delta Memory.

Associative Search as Memory

Several commenters noted that human memory appears highly associative, compressed, selective, and reconstructive rather than exact replay. (Hacker News)

That became one of the central ideas of this post:

memory = reachable information under current conditions