Delta Memory: Cargo-Culting Human Memory with Search
AI systems today have no idea why their own memory changes, that’s the problem we are trying to solve in this post.
Summary
Most AI memory systems start from a practical place: retrieval. Retrieval is useful, scalable, and often the right tool for the job. But if we want systems that interact with humans in more human‑like ways, we need a different analogy, not storage, but thinking.
Humans don’t store perfect records. We don’t retrieve exact text or replay video files. What we call “memory” is a shifting landscape of associations, impressions, weights, and patterns. When you recall something, you’re not pulling a file from disk, you’re running a search across your internal world, shaped by everything you’ve lived through.
If you and I both think about a traumatic event, the images that surface will be completely different. They’re shaped by our experiences, our histories, our fears, our movies, our books, our relationships. Human memory is contextual, associative, and deeply personal.
That’s the idea I wanted to capture in this system.
I wanted to give an AI something analogous to that:
a way to surface related items, weight them, connect them, and, most importantly, a way to measure how those connections change over time.
Because humans don’t just have memories.
We change because of them.
Think about a simple example:
You believe in a politician. One day they say something that breaks your trust. Suddenly your stance shifts. That shift didn’t come from nowhere, it came from a chain of influences, experiences, and signals that accumulated until your internal model updated.
We do this constantly.
Every day.
Without noticing.
But an AI?
When an AI suddenly changes its answer, preference, or reasoning path, we have no idea why. There’s no internal telemetry. No attribution. No explanation. No way to tell whether the change was healthy, harmful, or accidental.
Before we can give an AI the ability to change its mind, we need a way to measure that change.
That’s what this post is about.
It’s about building a system that:
- represents memory as a weighted, contextual belief state
- compares those states over time
- attributes the change to specific sources
- evaluates whether the change looks healthy or dangerous
- and uses that signal to influence future search behavior
In other words:
a system that doesn’t just store information, it thinks with it, updates from it, and explains how it changed.
This is the missing layer between retrieval and reasoning.
Why “Cargo Cult” Memory?
A cargo cult copies the visible structure of something without having access to the underlying mechanism. That’s the spirit of this project: not to replicate human memory, but to reproduce the observable behaviors that make human memory useful.
Delta Memory is not a theory of consciousness or a model of the human brain. It is a runtime system engineered to mimic the functional properties we can actually observe in human recall:
- associative activation
- reinforcement
- recency effects
- salience weighting
- source dominance
- contradiction handling
- reconstruction instead of replay
- shifting accessibility over time
These are the patterns we see in human memory from the outside — the parts we can measure, reason about, and design for.
So the guiding question became:
What happens if we intentionally engineer those observable properties into a runtime system?
Delta Memory is my attempt to explore that question: a practical, deterministic layer that behaves like memory in the ways that matter for cognition, without pretending to be the real thing.
The Core Insight: Memory Is Not Storage
Traditional AI systems usually treat memory as stored state:
facts
documents
embeddings
conversation history
But human memory does not feel like retrieval.
It feels like search.
What we remember depends on:
- what is emotionally salient
- what was recently reinforced
- what contradicts current beliefs
- what is currently accessible
- what other memories activate nearby associations
In other words:
memory = reachable information under current conditions
This explains why humans:
- forget obvious things
- become emotionally biased
- reinforce beliefs over time
- misremember details
- drift ideologically
- reconstruct memories differently depending on context
Those are not storage problems.
They are search-topology problems.
That realization completely changed how I thought about AI memory systems.
The Real Shift: Memory as State Transition
Traditional AI memory systems care about:
what is stored
Delta Memory cares about:
what changed
That distinction matters enormously.
The key object in the system is not merely the memory snapshot:
MemorySetDTO # a weighted belief state at time T
The key object is the transition:
DeltaMemoryDiffDTO # the measured change between T and T+1
That transition contains:
- what changed
- why it changed
- which source caused the change
- whether the change is healthy
- whether the system should trust the change
- whether future behavior should adapt
This is the difference between:
memory as storage
and:
memory as evolving cognition
The Core Contracts
Before looking at the runtime flow, it helps to see the two central contracts.
MemorySetDTO represents memory at a moment in time. It is not just a list of retrieved items. It includes candidates, source contributions, source reports, aggregate score, and dominance information.
DeltaMemoryDiffDTO represents the measured transition between two memory states. This is where memory becomes observable: the system can compare before and after, then ask what changed, which source caused it, and whether the change should be trusted.
class MemorySetDTO(BaseModel):
memory_set_id: str
goal: str
query: str | None
candidates: list[MemoryCandidateDTO]
contributions: list[MemoryContributionDTO]
source_reports: list[MemorySourceReportDTO]
aggregate_score: float
dominant_source: str | None
dominance_ratio: float | None
class DeltaMemoryDiffDTO(BaseModel):
before_memory_set_id: str
after_memory_set_id: str
changed_top_candidate: bool
changed_dominant_source: bool
aggregate_score_delta: float
candidate_deltas: list[CandidateDeltaDTO]
source_deltas: list[SourceDeltaDTO]
attribution: DeltaMemoryAttributionDTO
health: DeltaMemoryHealthDTO
decision: DeltaMemoryChangeDecisionDTO
The important design choice is that memory and memory change are separate objects.
A MemorySetDTO answers: “What is active right now?”
A DeltaMemoryDiffDTO answers: “How did the active state change, and should that change affect future behavior?”
That separation is what makes the rest of the system measurable.
Stage 1 Composing Active Memory
The first stage builds a runtime memory snapshot from multiple weighted sources.
Inside Writer, these sources are configurable:
- current runtime context
runtime.search- database memory
- model priors
- future memory providers
Each source contributes memory candidates into a composed belief state.
Building a Memory Set
The first implementation step is to compose a memory state from configurable sources.
In this example, four sources contribute to memory: current context, runtime search, database memory, and model prior. Each source has a weight. Those weights do not decide the answer directly; they decide how strongly each source can influence the active memory state.
config = DeltaMemoryConfigDTO(
normalize_weights=True,
sources=[
MemorySourceConfigDTO(
source_name="current_context",
source_type="context_memory",
weight=0.30,
),
MemorySourceConfigDTO(
source_name="runtime_search",
source_type="search_memory",
weight=0.25,
),
MemorySourceConfigDTO(
source_name="writer_database",
source_type="database_memory",
weight=0.25,
),
MemorySourceConfigDTO(
source_name="trained_model_prior",
source_type="model_memory",
weight=0.20,
),
],
)
memory_set = runtime.delta_memory.build_memory_set(
goal="review artifact",
query="lens runtime voice preservation",
context={
"context_memory_candidates": [
{
"text": "Current task is an artifact review.",
"confidence": 0.90,
"relevance": 0.85,
}
],
"search_hits": [
{
"hit_id": "search:1",
"title": "Lens contribution report",
"summary": "Voice preservation and semantic similarity results.",
"score": 0.92,
}
],
},
config=config,
)
print(memory_set.dominant_source)
print(memory_set.dominance_ratio)
This example produces a memory set for a specific goal and query.
Notice that context and search are both allowed to contribute. The current task says “this is an artifact review,” while search contributes a concrete result about lens reports and voice preservation. Delta Memory does not assume either source is automatically correct. It records both, weights both, and then measures which one dominates.
The engine:
- normalizes weights across sources
- collects raw candidates
- filters by confidence and top‑k
- computes weighted scores and aggregate confidence
- records contribution metadata and provenance
- identifies the dominant source and dominance ratio
The result is a MemorySetDTO: the system’s active memory state at a specific moment in time.
Not all memories are equal. Some sources dominate. Some weaken. Some reinforce one another. That is intentional. Human memory behaves similarly.
Diagram: Memory Composition
The diagram below shows the same process visually.
The important idea is that sources do not write directly into memory. They first produce raw candidates. Those candidates are then normalized, filtered, weighted, and assembled into a single memory set.
flowchart TD
A["🎯 Goal + Query"] --> B{⚙️ DeltaMemoryEngine}
B --> C1["🗣️ Context Memory<br>weight=0.30"]
B --> C2["🔍 Search Memory<br>weight=0.25"]
B --> C3["💾 Database Memory<br>weight=0.25"]
B --> C4["🧠 Model Prior<br>weight=0.20"]
C1 --> D["📦 Raw Candidates"]
C2 --> D
C3 --> D
C4 --> D
D --> E["⚖️ Normalize & Filter<br>• confidence threshold<br>• top‑k per source"]
E --> F["✅ MemorySetDTO<br>━━━━━━━━━━━━━━━━<br>📋 candidates (weighted)<br>📊 source reports<br>👑 dominant source<br>📈 aggregate score"]
style A fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style B fill:#fff3e0,stroke:#e65100,stroke-width:2px
style C1 fill:#f3e5f5,stroke:#4a148c
style C2 fill:#e8f5e9,stroke:#1b5e20
style C3 fill:#fff8e1,stroke:#f57f17
style C4 fill:#ffebee,stroke:#b71c1c
style F fill:#e0f2f1,stroke:#004d40,stroke-width:3px
This is the first point where memory becomes inspectable.
Instead of asking the model to “remember” something implicitly, the runtime can show exactly which sources contributed, which candidates survived filtering, and which source dominated the final state.
The Scoring Rule
At the center of Stage 1 is a deliberately simple scoring rule.
A candidate has its own relevance and confidence. The source also has a configured weight. The final weighted score combines both.
original_score = candidate.raw_score or (
candidate.relevance * candidate.confidence
)
weighted_score = source_weight * original_score
Memory Sources Are Not Neutral
One of the most important design choices was making memory sources explicitly configurable and measurable.
Each source can be:
- enabled or disabled
- weighted differently
- filtered independently
- capped by top‑k
- governed separately
For example, the default configuration looks like this:
sources:
- source_name: current_context
source_type: context_memory
weight: 0.30
- source_name: runtime_search
source_type: search_memory
weight: 0.25
- source_name: writer_database
source_type: database_memory
weight: 0.25
- source_name: trained_model_prior
source_type: model_memory
weight: 0.20
This matters because memory is not neutral.
Context can over-steer cognition.
Search can dominate recall.
Model priors can reinforce stale beliefs.
The system therefore measures dominance explicitly and emits warnings like memory_source_dominance_detected when a single source exceeds a safe threshold.
Stage 2 Measuring Memory Change
This is where the architecture becomes interesting.
Given two memory states:
MemorySet(before)
MemorySet(after)
the system computes:
diff = diff_service.diff_memory_sets(before=before_set, after=after_set)
The resulting DeltaMemoryDiffDTO measures:
- candidate deltas (added, removed, strengthened, weakened, unchanged)
- source deltas and dominance shifts
- contribution-level provenance changes
- aggregate score shifts
- attribution of cause
- health and risk evaluation
- recommended governance action
Example:
print(diff.changed_dominant_source) # True
print(diff.attribution.primary_cause_source) # "runtime_search"
print(diff.health.health_status) # "suspicious"
print(diff.decision.action) # "dampen"
The important point is this:
The system can now reason about its own memory evolution.
That is fundamentally different from simple retrieval.
A Tiny Before/After Example
To make the delta concrete, imagine a routing task.
At first, the context suggests that generic repository search is probably enough. Then a search result appears suggesting that the artifact/lens path is a better fit. Delta Memory builds one memory set before that evidence and one after it.
before = runtime.delta_memory.build_memory_set(
goal="route artifact review task",
query="best runtime path",
context={
"context_memory_candidates": [
{
"text": "Generic repository search is probably enough.",
"confidence": 0.90,
"relevance": 0.90,
}
]
},
config=config,
)
after = runtime.delta_memory.build_memory_set(
goal="route artifact review task",
query="best runtime path",
context={
"context_memory_candidates": [
{
"text": "Generic repository search is probably enough.",
"confidence": 0.90,
"relevance": 0.90,
}
],
"search_hits": [
{
"hit_id": "search:artifact-lens",
"title": "Artifact + Lens runtime fit",
"summary": "Artifact review should use lens contribution reports.",
"score": 0.96,
}
],
},
config=config,
)
from the other day i'm doing corridor comment at 8 just to be ready but I'm not coming into line yeah but that's the worst I got here this morning at 9 after)
print(diff.changed_dominant_source)
print(diff.attribution.primary_cause_source)
print(diff.health.health_status)
print(diff.decision.action)
Generated results
changed_dominant_source: True
primary_cause_source: runtime_search
health_status: suspicious
decision: dampen
recommended_weight_adjustments: {"runtime_search": -0.15}
The important point is not that the system changed its mind.
The important point is that the change is now measurable. We can inspect whether the top candidate changed, whether the dominant source changed, and whether runtime search was actually responsible for the shift.
Diagram: Measuring the Delta
The next diagram shows what happens after two memory states exist.
This is the heart of Delta Memory: the runtime does not just keep the newer memory set. It compares the before and after states, computes deltas, attributes the cause, evaluates risk, and recommends a governance action.
flowchart LR
subgraph BEFORE ["🕒 Before State"]
B_Mem["MemorySetDTO<br>dominant: context<br>score: 0.62"]
end
subgraph AFTER ["🕓 After State"]
A_Mem["MemorySetDTO<br>dominant: search<br>score: 0.80"]
end
B_Mem --> DIFF["🔁 DeltaMemoryDiffService"]
A_Mem --> DIFF
DIFF --> CAND["📌 Candidate Deltas<br>• added / removed<br>• strengthened / weakened"]
DIFF --> SRC["📂 Source Deltas<br>• weighted score Δ<br>• dominance shift"]
DIFF --> CONT["🔗 Contribution Deltas<br>• provenance changes"]
CAND --> ATTR["🧠 Attribution Service<br>━━━━━━━━━━━━━━━━<br>primary_cause_source:<br>runtime_search (0.71)"]
SRC --> ATTR
CONT --> ATTR
ATTR --> HEALTH["🩺 Health Evaluator<br>risk = 0.35*d + 0.30*v + 0.20*dr + 0.15*c<br>━━━━━━━━━━━━━━━━<br>→ suspicious"]
ATTR --> DECIDE["⚖️ Decision Service<br>━━━━━━━━━━━━━━━━<br>→ dampen"]
HEALTH --> OUTPUT["📄 DeltaMemoryDiffDTO"]
DECIDE --> OUTPUT
style BEFORE fill:#f3e5f5,stroke:#6a1b9a
style AFTER fill:#e8f5e9,stroke:#2e7d32
style DIFF fill:#fff9c4,stroke:#f57f17,stroke-width:2px
style ATTR fill:#d1c4e9,stroke:#4527a0
style HEALTH fill:#ffccbc,stroke:#bf360c
style DECIDE fill:#b2dfdb,stroke:#004d40
style OUTPUT fill:#c8e6c9,stroke:#1b5e20,stroke-width:3px
The diagram is intentionally mechanical.
There is no hidden model interpretation here. The system compares structured objects, measures score changes, computes attribution, evaluates health, and emits a decision. That makes memory change replayable instead of mysterious.
Attribution: What Changed The Memory?
One of the strongest parts of the system is deterministic attribution.
The DeltaMemoryAttributionService computes source influence by summing absolute weighted-score deltas across all candidates and contributions, then normalizing by total change. The source with the highest normalized influence becomes the primary_cause_source.
For example:
primary_cause_source = "runtime_search"
source_influence = {
"runtime_search": 0.71,
"current_context": 0.19,
"model_prior": 0.10,
}
This allows the runtime to answer questions like:
- Did search distort memory?
- Is current context over-dominating?
- Did recent evidence outweigh historical belief?
- Is one source poisoning cognition?
Most AI systems cannot answer those questions. Delta Memory can. No black-box attention maps. Just rule-based, replayable attribution.
Attribution as Accounting
The attribution step is intentionally simple.
Instead of trying to infer causality from hidden model activations, the system treats memory change as an accounting problem. It sums how much each source changed the weighted score, normalizes those changes, and reports the largest contributor.
def attribute_source_change(source_deltas, contribution_deltas):
totals = defaultdict(float)
for source_delta in source_deltas:
totals[source_delta.source_name] += abs(
source_delta.weighted_score_delta
)
for contribution_delta in contribution_deltas:
totals[contribution_delta.source_name] += abs(
contribution_delta.weighted_score_delta
)
total_change = sum(totals.values())
if total_change == 0:
return {}
return {
source_name: change / total_change
for source_name, change in totals.items()
}
This is not a perfect theory of causality, but it is useful engineering.
If runtime_search accounts for 71% of the weighted-score change, the system can say that search was the primary cause of the memory shift. That gives us something concrete to inspect, dampen, reject, or reinforce later.
Health Evaluation and Cognitive Drift
The system does not assume every memory change is good. It explicitly evaluates memory health.
The DeltaMemoryHealthEvaluator computes:
- drift_score: magnitude of aggregate score change
- dominance_score: how heavily one source controls the set
- volatility_score: proportion of candidates that shifted
- contradiction_score: presence of conflicting evidence
- confidence_score: average confidence of the new state
- risk_score: weighted combination
The risk formula is deterministic:
Health Is Separate From Attribution
Attribution tells us what caused the memory change.
Health evaluation asks a different question: was the change safe?
A source can be the primary cause of a change and still be healthy. Or it can dominate too strongly, create volatility, and trigger a suspicious or dangerous classification.
risk_score = (
0.35 * dominance_score +
0.30 * volatility_score +
0.20 * drift_score +
0.15 * contradiction_score
)
Transitions are classified as:
healthy (risk < 0.35)
suspicious (0.35 ≤ risk < 0.70)
dangerous (risk ≥ 0.70)
These thresholds are not meant to be universal truths.
They are runtime policy. The point is not that 0.35 is magic. The point is that memory drift becomes measurable enough to govern. Different systems can tune these weights based on their risk tolerance.
This was heavily inspired by observing how humans drift cognitively. People become dominated by emotional recency, ideological reinforcement, authority bias, or repeated exposure. Delta Memory attempts to model these risks explicitly.
Warnings like memory_source_dominance_detected, high_memory_volatility, or source_dominance_changed surface before downstream behavior corrupts.
Governance Decisions
Once the system can evaluate memory change, it can govern it.
Every memory diff produces a recommendation via DeltaMemoryDecisionService:
accept
dampen
reject
investigate
From Measurement to Action
Once memory change has attribution and health, the runtime can recommend what to do next.
This is the beginning of cognitive governance: not just observing that memory changed, but deciding whether to accept the change, dampen it, reject it, or investigate it.
decision.action == "dampen"
This is the first place the system begins to regulate its own memory.
A suspicious change does not have to be accepted blindly. The runtime can reduce the influence of the source that caused the shift and then recompose memory to see whether the belief state stabilizes.
This might occur when:
runtime.searchsuddenly dominates memory- volatility spikes
- contradictory evidence appears
- current context overwhelms longer-term memory
The decision engine also emits concrete next steps:
recommended_weight_adjustments = {"runtime_search": -0.15}
recommended_followup_checks = ["recompose_memory_after_weight_adjustment"]
This creates a recursive loop:
memory → evaluation → governance → future memory composition
That loop is the beginning of runtime cognitive regulation.
Observable Cognition
This was the genuinely surprising part.
The system became introspectable.
Most AI systems cannot answer:
- Why did memory change?
- Which source caused the shift?
- Is the system drifting?
- Which source dominates?
- Is the current conversation over-steering reasoning?
- Should the change be trusted?
Delta Memory can.
Every memory transition becomes:
- measurable
- attributable
- replayable
- governable
This creates something that begins to resemble runtime epistemics rather than prompt engineering.
Why This Matters Now
LLMs are hitting the limits of:
- longer context windows
- bigger models
- naive retrieval
- stateless agents
The next frontier is runtime cognition — systems that:
- maintain internal state
- evolve over time
- detect drift
- attribute belief changes
- regulate their own memory
- bias their own search
- explain their own transitions
Delta Memory is one architecture for that. It doesn’t replace retrieval, it sits above it as a cognitive layer.
The Trade-off
Most RAG systems optimize for recall (finding the right chunk). Delta Memory optimizes for epistemic stability (knowing why you believe what you believe).
In production, this means:
- Fewer Hallucinations via Drift: By detecting when a single source dominates, we prevent the model from latching onto noisy data.
- Audit Trails: When an agent gives a wrong answer, we can trace exactly which memory delta caused the shift.
- Self-Correction: The system can dampen its own biases before they corrupt the final output.
It’s not just about remembering more; it’s about remembering better.
Search as the Cognitive Substrate
Originally, Writer’s search domain was just infrastructure. A runtime registry. A way to search files, process runs, voice restoration artifacts, workflows, and runtime state.
But over time the search engine accumulated:
- recency weighting
- runtime specialization
- capability awareness
- behavioral actions
- memory references
- cross-runtime aggregation
At some point, it stopped behaving like retrieval infrastructure and started behaving like associative recall.
Writer’s search is built around a registry of SearchableRuntimeProtocol adapters. Each runtime exposes:
search_capability()
search(request)
get_recent_memory()
get_actions()
The engine fans out, deduplicates, ranks with recency boosts, merges evidence, and attaches actionable next steps.
Crucially, memory itself becomes searchable. We expose composed memory sets via DeltaMemorySearchAdapter, allowing them to appear as hits in runtime.search.
But there is a second, more powerful loop: Memory guiding search.
| Loop | Direction | Mechanism | Purpose |
|---|---|---|---|
| Recall | Search → Memory | DeltaMemorySearchAdapter |
Let the agent query its own past beliefs. |
| Guidance | Memory → Search | SearchMemoryBridge |
Let current beliefs bias where the agent looks next. |
Together, they form a recursive cognitive cycle.
Stage 3 Memory-Guided Search
Once memory transitions become measurable, they can influence future search.
That is where SearchBiasDTO comes in.
The SearchMemoryBridge consumes a MemorySetDTO and/or DeltaMemoryDiffDTO and produces bias instructions that tell runtime.search:
- which runtimes to prefer or suppress
- which query terms to expand or block
- how deep to search
- whether to stop (if memory has stabilized)
- whether to rerun (if health is dangerous or action is
investigate)
Example:
Search as a Runtime Interface
For memory to behave like associative recall, search needs to be broader than document retrieval.
In Writer, search is exposed as a runtime interface. Files, process runs, voice restoration artifacts, and memory itself can all become searchable runtimes.
bias = bridge.build_search_bias(request)
bias.preferred_runtimes # ["artifact", "lens"]
bias.suppressed_runtimes # ["repo"]
bias.query_expansions # ["artifact review", "voice preservation"]
bias.depth_hint # "normal"
bias.should_rerun # False
bias.should_stop # False
This matters because each runtime is a different kind of memory surface.
A filesystem search is not the same as a process-run search. A voice restoration artifact is not the same as a model prior. By exposing them through the same interface, the system can search across different forms of experience.
Importantly, search internals remain untouched. The bridge emits structured bias hints, not rewritten retrieval logic. Any search consumer can respect or ignore them.
This means memory changes behavior. Behavior produces new evidence. New evidence changes memory.
That recursive loop is the real architecture.
Memory-Guided Search
Once the system can measure memory change, it can use that change to guide future search.
SearchBiasDTO is the bridge. It does not rewrite search internals. It simply describes how memory thinks the next search should be shaped.
class SearchBiasDTO(BaseModel):
preferred_runtimes: list[str]
suppressed_runtimes: list[str]
query_expansions: list[str]
query_suppressions: list[str]
depth_hint: Literal["shallow", "normal", "deep"]
should_rerun: bool
should_stop: bool
reason: str
bias = runtime.delta_memory.build_search_bias(
goal="review artifact",
memory_set=after,
memory_diff=diff,
)
search_request = RuntimeSearchRequestDTO(
query="voice preservation report",
runtimes=bias.preferred_runtimes or None,
exclude_runtimes=bias.suppressed_runtimes,
depth=bias.depth_hint,
)
This creates a clean separation.
Search remains a general runtime service. Delta Memory produces bias hints: prefer these runtimes, suppress those runtimes, expand these terms, search deeper, rerun, or stop. Any search implementation can choose how much of that guidance to honor.
Diagram: The Recursive Memory/Search Loop
The final diagram shows the complete loop.
A query produces search results. Search results change the memory state. The memory diff produces bias. That bias changes the next search. The loop continues until the memory state stabilizes or the system decides it has enough evidence.
flowchart TD
START["🔍 User Query"] --> SEARCH["🌐 runtime.search<br>━━━━━━━━━━━━━━━━<br>• filesystem<br>• process runs<br>• voice restoration<br>• delta_memory (adapter)"]
SEARCH --> MEM1["📝 Memory Composition<br>→ MemorySetDTO (after)"]
MEM1 --> DIFF2["📊 Diff: before ↔ after<br>→ DeltaMemoryDiffDTO"]
DIFF2 --> BIAS["🎯 SearchMemoryBridge<br>→ SearchBiasDTO<br>━━━━━━━━━━━━━━━━<br>preferred_runtimes: [artifact,lens]<br>suppressed: [repo]<br>depth: deep<br>should_rerun: true"]
BIAS --> SEARCH2["🌐 runtime.search<br>(biased)"]
SEARCH2 --> MEM2["📝 Memory Composition<br>(new)"]
MEM2 --> STABLE{"🧘 Memory stable?"}
STABLE -->|No| DIFF2
STABLE -->|Yes| ANSWER["✅ Final Answer"]
style START fill:#e3f2fd,stroke:#0d47a1
style SEARCH fill:#fff3e0,stroke:#e65100
style MEM1 fill:#f1f8e9,stroke:#33691e
style DIFF2 fill:#fff9c4,stroke:#f57f17
style BIAS fill:#e8eaf6,stroke:#283593
style SEARCH2 fill:#ffe0b2,stroke:#bf360c
style STABLE fill:#f3e5f5,stroke:#6a1b9a
style ANSWER fill:#c8e6c9,stroke:#1b5e20,stroke-width:3px
This is the reason Delta Memory is more than a storage layer.
The system is not only remembering. It is using measured memory change to decide where to look next. That is the cargo-cult version of associative recall: imperfect, mechanical, but useful.
The Full Cognitive Loop
Search + Context + Model Priors + Database
↓
Memory Composition
↓
MemorySetDTO
↓
Memory Diffing
↓
DeltaMemoryDiffDTO
↓
Attribution + Health Evaluation
↓
Governance Decision
↓
Search Bias / Weight Adjustment
↓
Future Search
↓
(loop repeats)
This is no longer just retrieval. It is a runtime cognitive state machine.
Why This Beats Fine-Tuning
Most attempts at AI memory focus on:
- larger context windows
- more embeddings
- more parameters
- more training
Delta Memory takes the opposite approach.
Everything here is:
- runtime-only
- deterministic
- observable
- replayable
- governable
No transformer surgery. No unstable fine-tuning. No opaque weight updates. Every memory transition can be inspected, replayed, and explained.
That is enormously important if you want long-running agents that remain debuggable, safe, and adaptable. And because the signals are clean and structured, they form the exact telemetry needed later for lightweight MRQ/DPO training without destabilizing the runtime.
The Recursive Cognitive Loop
The diagram below shows the complete cycle.
It starts with a search, builds memory, measures the change, attributes it, evaluates health, decides what to do, and biases the next search. Then the loop repeats.
This is not a linear pipeline. It is a closed loop.
Memory changes search. Search changes memory.
The arrows point forward, but the system feeds back into itself.
flowchart LR
START(["🚀 Start"]) --> SEARCH
SEARCH["🔍 Search<br>━━━━━━━━━━<br>runtime.search"]
SEARCH --> MEMORY["📝 Memory<br>━━━━━━━━━━<br>MemorySetDTO"]
MEMORY --> DELTA["📊 Delta<br>━━━━━━━━━━<br>DeltaMemoryDiffDTO"]
DELTA --> ATTRIB["🧠 Attribution<br>━━━━━━━━━━<br>primary cause"]
ATTRIB --> HEALTH["🩺 Health<br>━━━━━━━━━━<br>risk & warnings"]
HEALTH --> DECISION["⚖️ Decision<br>━━━━━━━━━━<br>accept/dampen/reject"]
DECISION --> BIAS["🎯 Search Bias<br>━━━━━━━━━━<br>preferred runtimes<br>query expansions"]
BIAS -->|feeds back to| SEARCH
SEARCH --> LOOP_END(["🔄 Loop again"])
style SEARCH fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
style MEMORY fill:#f1f8e9,stroke:#33691e,stroke-width:2px
style DELTA fill:#fff9c4,stroke:#f57f17,stroke-width:2px
style ATTRIB fill:#d1c4e9,stroke:#4527a0,stroke-width:2px
style HEALTH fill:#ffccbc,stroke:#bf360c,stroke-width:2px
style DECISION fill:#b2dfdb,stroke:#004d40,stroke-width:2px
style BIAS fill:#ffe0b2,stroke:#e65100,stroke-width:2px
What makes this useful is that every step is inspectable. You can see why search returned certain results, how memory changed, which source caused the shift, whether the change was risky, and what decision the system made. No hidden state. No mysterious model internals.
That is the difference between a system that merely stores information and one that can regulate its own cognition.
So What?
This architecture does something most AI memory systems cannot: it makes memory change visible.
You can now ask:
- Why did the agent change its mind?
- Which source (context, search, database, model prior) caused the shift?
- Is the agent drifting toward dangerous beliefs?
- Should we trust this new memory state?
And you get answers. Not guesses. Not attention maps. Just deterministic, replayable measurements.
If you are building long‑running agents, this matters. Because a system that cannot explain why its memory changed will eventually change in ways you cannot control. Delta Memory is an attempt to keep that door open.
Building Your Own Delta Memory (In 5 Steps)
- Define
MemorySetDTO– a snapshot with candidates, sources, scores, and dominance. - Implement deterministic diff – match candidates by ID or stable text hash, classify changes.
- Add attribution – sum absolute score changes per source.
- Add health rules – risk = weighted sum of dominance + volatility + drift.
- Create a search bridge – translate memory deltas into preferred runtimes, query expansions, and depth hints.
Start with in‑memory SQLite. Keep everything runtime‑only. You’ll be surprised how far deterministic rules can take you.
The Bigger Picture
We no longer believe memory is fundamentally a storage problem.
We think it is a search problem.
More specifically:
memory is the evolving topology of associative search
Delta Memory is an attempt to model that topology explicitly:
- compose it
- measure it
- attribute it
- evaluate it
- govern it
- and eventually allow the system to tune itself safely over time
It may be one small step toward systems that maintain something more important than static context:
an evolving cognitive state.
That might be where real memory starts.
Appendix A: A Minimal SQLite Delta Memory Demo
Below is a tiny, self-contained example of Delta Memory using only Python and SQLite. It demonstrates the whole loop:
memory sources → memory set A → memory set B → diff → attribution → health → decision
import sqlite3
from dataclasses import dataclass
from collections import defaultdict
from uuid import uuid4
# -----------------------------
# Data structures
# -----------------------------
@dataclass
class Candidate:
source: str
text: str
confidence: float
relevance: float
@property
def score(self) -> float:
return self.confidence * self.relevance
@dataclass
class MemoryItem:
source: str
text: str
weighted_score: float
# -----------------------------
# SQLite setup
# -----------------------------
def init_db(conn):
conn.execute("""
CREATE TABLE IF NOT EXISTS memory_sets (
id TEXT PRIMARY KEY,
label TEXT,
dominant_source TEXT,
aggregate_score REAL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS memory_items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
memory_set_id TEXT,
source TEXT,
text TEXT,
weighted_score REAL
)
""")
conn.commit()
# -----------------------------
# Stage 1: Compose memory
# -----------------------------
SOURCE_WEIGHTS = {
"context": 0.30,
"search": 0.25,
"database": 0.25,
"model_prior": 0.20,
}
def build_memory_set(conn, label: str, candidates: list[Candidate]) -> str:
memory_set_id = str(uuid4())
items = []
source_totals = defaultdict(float)
for candidate in candidates:
source_weight = SOURCE_WEIGHTS[candidate.source]
weighted_score = source_weight * candidate.score
item = MemoryItem(
source=candidate.source,
text=candidate.text,
weighted_score=weighted_score,
)
items.append(item)
source_totals[candidate.source] += weighted_score
aggregate_score = sum(item.weighted_score for item in items)
dominant_source = None
if source_totals:
dominant_source = max(source_totals.items(), key=lambda x: x[1])[0]
conn.execute(
"""
INSERT INTO memory_sets (id, label, dominant_source, aggregate_score)
VALUES (?, ?, ?, ?)
""",
(memory_set_id, label, dominant_source, aggregate_score),
)
for item in items:
conn.execute(
"""
INSERT INTO memory_items
(memory_set_id, source, text, weighted_score)
VALUES (?, ?, ?, ?)
""",
(memory_set_id, item.source, item.text, item.weighted_score),
)
conn.commit()
return memory_set_id
def load_memory_items(conn, memory_set_id: str) -> list[MemoryItem]:
rows = conn.execute(
"""
SELECT source, text, weighted_score
FROM memory_items
WHERE memory_set_id = ?
""",
(memory_set_id,),
).fetchall()
return [
MemoryItem(source=row[0], text=row[1], weighted_score=row[2])
for row in rows
]
def load_memory_set(conn, memory_set_id: str):
return conn.execute(
"""
SELECT id, label, dominant_source, aggregate_score
FROM memory_sets
WHERE id = ?
""",
(memory_set_id,),
).fetchone()
# -----------------------------
# Stage 2: Diff memory
# -----------------------------
def diff_memory_sets(conn, before_id: str, after_id: str) -> dict:
before = load_memory_set(conn, before_id)
after = load_memory_set(conn, after_id)
before_items = load_memory_items(conn, before_id)
after_items = load_memory_items(conn, after_id)
before_by_text = {item.text: item for item in before_items}
after_by_text = {item.text: item for item in after_items}
candidate_deltas = []
source_delta_totals = defaultdict(float)
all_texts = set(before_by_text) | set(after_by_text)
for text in sorted(all_texts):
before_item = before_by_text.get(text)
after_item = after_by_text.get(text)
before_score = before_item.weighted_score if before_item else 0.0
after_score = after_item.weighted_score if after_item else 0.0
delta = after_score - before_score
source = after_item.source if after_item else before_item.source
if before_item is None:
change_type = "added"
elif after_item is None:
change_type = "removed"
elif delta > 0:
change_type = "strengthened"
elif delta < 0:
change_type = "weakened"
else:
change_type = "unchanged"
candidate_deltas.append({
"text": text,
"source": source,
"before_score": before_score,
"after_score": after_score,
"delta": delta,
"change_type": change_type,
})
source_delta_totals[source] += abs(delta)
total_change = sum(source_delta_totals.values())
source_influence = {
source: change / total_change
for source, change in source_delta_totals.items()
} if total_change else {}
primary_cause_source = None
if source_influence:
primary_cause_source = max(source_influence.items(), key=lambda x: x[1])[0]
aggregate_score_delta = after[3] - before[3]
changed_dominant_source = before[2] != after[2]
health = evaluate_health(
after_dominant_source=after[2],
after_items=after_items,
aggregate_score_delta=aggregate_score_delta,
candidate_deltas=candidate_deltas,
)
decision = decide(primary_cause_source, health, changed_dominant_source)
return {
"before_label": before[1],
"after_label": after[1],
"before_dominant_source": before[2],
"after_dominant_source": after[2],
"changed_dominant_source": changed_dominant_source,
"aggregate_score_delta": aggregate_score_delta,
"candidate_deltas": candidate_deltas,
"source_influence": source_influence,
"primary_cause_source": primary_cause_source,
"health": health,
"decision": decision,
}
# -----------------------------
# Stage 3: Health + decision
# -----------------------------
def evaluate_health(
after_dominant_source: str,
after_items: list[MemoryItem],
aggregate_score_delta: float,
candidate_deltas: list[dict],
) -> dict:
total_score = sum(item.weighted_score for item in after_items)
source_totals = defaultdict(float)
for item in after_items:
source_totals[item.source] += item.weighted_score
dominance_score = 0.0
if total_score > 0 and after_dominant_source:
dominance_score = source_totals[after_dominant_source] / total_score
changed_items = [
item for item in candidate_deltas
if item["change_type"] != "unchanged"
]
volatility_score = len(changed_items) / max(1, len(candidate_deltas))
drift_score = abs(aggregate_score_delta)
risk_score = (
0.35 * dominance_score +
0.30 * volatility_score +
0.20 * drift_score
)
if risk_score < 0.35:
status = "healthy"
elif risk_score < 0.70:
status = "suspicious"
else:
status = "dangerous"
return {
"dominance_score": round(dominance_score, 3),
"volatility_score": round(volatility_score, 3),
"drift_score": round(drift_score, 3),
"risk_score": round(risk_score, 3),
"status": status,
}
def decide(primary_cause_source: str | None, health: dict, changed_dominant_source: bool) -> dict:
if health["status"] == "healthy":
return {
"action": "accept",
"reason": "memory change appears stable",
}
if health["status"] == "dangerous":
return {
"action": "reject",
"reason": "memory drift risk is too high",
"source_to_review": primary_cause_source,
}
if changed_dominant_source:
return {
"action": "investigate",
"reason": "dominant memory source changed",
"source_to_review": primary_cause_source,
}
return {
"action": "dampen",
"reason": "memory shift is suspicious but not dangerous",
"recommended_weight_adjustment": {
primary_cause_source: -0.15
} if primary_cause_source else {},
}
# -----------------------------
# Demo
# -----------------------------
def main():
conn = sqlite3.connect(":memory:")
init_db(conn)
before_candidates = [
Candidate(
source="context",
text="Generic repository search is probably enough.",
confidence=0.90,
relevance=0.90,
),
Candidate(
source="model_prior",
text="Use broad search first when routing is uncertain.",
confidence=0.70,
relevance=0.70,
),
]
after_candidates = [
Candidate(
source="context",
text="Generic repository search is probably enough.",
confidence=0.90,
relevance=0.90,
),
Candidate(
source="search",
text="Artifact review should use Lens contribution reports.",
confidence=0.96,
relevance=0.96,
),
Candidate(
source="search",
text="Voice preservation and semantic similarity are Lens signals.",
confidence=0.90,
relevance=0.88,
),
Candidate(
source="model_prior",
text="Use broad search first when routing is uncertain.",
confidence=0.70,
relevance=0.70,
),
]
before_id = build_memory_set(conn, "before", before_candidates)
after_id = build_memory_set(conn, "after", after_candidates)
diff = diff_memory_sets(conn, before_id, after_id)
print("\n=== Delta Memory Diff ===")
print(f"before dominant source: {diff['before_dominant_source']}")
print(f"after dominant source: {diff['after_dominant_source']}")
print(f"changed dominant: {diff['changed_dominant_source']}")
print(f"aggregate delta: {diff['aggregate_score_delta']:.3f}")
print(f"primary cause: {diff['primary_cause_source']}")
print("\n=== Source Influence ===")
for source, influence in diff["source_influence"].items():
print(f"{source}: {influence:.2f}")
print("\n=== Health ===")
for key, value in diff["health"].items():
print(f"{key}: {value}")
print("\n=== Decision ===")
for key, value in diff["decision"].items():
print(f"{key}: {value}")
print("\n=== Candidate Deltas ===")
for item in diff["candidate_deltas"]:
print(
f"{item['change_type']:12} "
f"{item['source']:10} "
f"{item['delta']:+.3f} "
f"{item['text']}"
)
if __name__ == "__main__":
main()
Expected output:
=== Delta Memory Diff ===
before dominant source: context
after dominant source: search
changed dominant: True
aggregate delta: 0.409
primary cause: search
=== Source Influence ===
context: 0.00
model_prior: 0.00
search: 1.00
=== Health ===
dominance_score: 0.705
volatility_score: 0.5
drift_score: 0.409
risk_score: 0.479
status: suspicious
=== Decision ===
action: investigate
reason: dominant memory source changed
source_to_review: search
=== Candidate Deltas ===
added search +0.230 Artifact review should use Lens contribution reports.
added search +0.198 Voice preservation and semantic similarity are Lens signals.
unchanged context +0.000 Generic repository search is probably enough.
unchanged model_prior +0.000 Use broad search first when routing is uncertain.
This small example shows the entire idea:
- A first memory state is built from context and model prior.
- A second memory state adds search evidence.
- The dominant source changes from
contexttosearch. - The system attributes the change to
search. - The health evaluator marks the shift as
suspicious. - The decision layer recommends investigation instead of blindly accepting the change.
Glossary
| Term | Definition |
|---|---|
| Delta Memory | A runtime memory approach that measures how an active memory state changes over time, attributes the change to sources, and decides whether the change should be trusted. |
| MemorySetDTO | A snapshot of active memory: weighted candidates, source reports, contributions, aggregate score, dominant source, and provenance. |
| DeltaMemoryDiffDTO | A measured transition between two memory sets, showing what changed, why it changed, who caused it, and whether the change is healthy. |
| Memory Candidate | A possible memory item surfaced from context, search, database records, model priors, or another source. |
| Memory Contribution | The recorded effect of a candidate on the final memory set, including source weight, original score, weighted score, and provenance. |
| Memory Source | Any configurable provider of memory candidates, such as current context, runtime search, database memory, or model prior. |
| Source Weight | The influence assigned to a memory source before candidates are combined into a memory set. |
| Dominant Source | The memory source contributing the largest share of the final weighted memory state. |
| Dominance Ratio | The proportion of the total memory score controlled by the dominant source. High dominance may indicate over-steering. |
| Memory Drift | A meaningful shift in the active memory state over time. Drift can be healthy, suspicious, or dangerous. |
| Candidate Delta | The change in a memory candidate between two memory states: added, removed, strengthened, weakened, or unchanged. |
| Source Delta | The change in influence of a memory source between two memory states. |
| Contribution Delta | The change in a specific source/candidate contribution between two memory states. |
| Attribution | The process of identifying which source or candidate most contributed to a memory change. |
| Primary Cause Source | The source with the highest normalized influence over a memory change. |
| Health Evaluation | Runtime analysis that scores memory change using drift, dominance, volatility, contradiction, confidence, and risk. |
| Risk Score | A combined score estimating whether a memory change is safe, suspicious, or dangerous. |
| Governance Decision | The recommended action after evaluating a memory change: accept, dampen, reject, or investigate. |
| Dampen | Reduce the influence of a source when it appears to be over-steering memory. |
| Search as Cognition | The idea that memory behaves less like static storage and more like an evolving search process through available experience. |
| SearchBiasDTO | A proposed object that translates memory changes into search guidance, such as preferred runtimes, suppressed runtimes, query expansions, and depth hints. |
| SearchMemoryBridge | A proposed bridge that lets memory diffs influence future search behavior without rewriting search internals. |
| DeltaMemorySearchAdapter | An adapter that exposes composed memory as a searchable runtime, allowing memory itself to appear in search results. |
| Observable Cognition | A design goal where memory changes are measurable, attributable, replayable, and governable rather than hidden inside model behavior. |
| Cargo-Cult Memory | An intentionally humble framing: copying observable properties of human memory, such as association, salience, drift, and reinforcement, without claiming human-like consciousness. |
References & Initial Inspiration
This post was originally inspired by two separate discussions around AI memory, retrieval, and associative runtime state. While the architecture described here evolved far beyond the original discussions, these threads helped crystallize the initial direction.
Hacker News Discussions
-
Hacker News, δ-mem: Efficient Online Memory for Large Language Models Discussion around compact online associative memory for LLMs, runtime memory state, retrieval limitations, multi-layer memory systems, and efficient long-context reasoning. Several comments explored the idea that memory may behave more like layered associative search than static replay. (Hacker News)
-
δ-mem Paper (ArXiv) Introduces a lightweight online associative memory mechanism that augments a frozen transformer with a compact runtime memory state. The paper proposes updating a small online memory matrix during inference and using it to influence future attention computation. (arXiv)
-
δ-mem GitHub Discussion Reference Reference implementation and community discussion surrounding the paper. (Reddit)
Key Ideas That Influenced This Post
Runtime Memory vs Static Context
The δ-mem paper argues that simply increasing context windows is inefficient and often ineffective for true long-term reasoning. Instead, it proposes maintaining a compact evolving associative memory state during inference. (arXiv)
That directly influenced the idea explored in this post:
memory is not only stored context
memory is evolving runtime state
Multi-Layer Memory
One of the most interesting Hacker News comments proposed that useful AI memory likely requires multiple layers:
- short-term workspace memory
- active task memory
- searchable historical memory
- compressed long-term memory
rather than a single retrieval mechanism. (Hacker News)
That idea heavily influenced the layered MemorySource architecture used in Delta Memory.
Associative Search as Memory
Several commenters noted that human memory appears highly associative, compressed, selective, and reconstructive rather than exact replay. (Hacker News)
That became one of the central ideas of this post:
memory = reachable information under current conditions
Further Reading
For readers interested in exploring related ideas, these papers and projects are excellent next steps.
Associative / Runtime Memory
-
δ-mem: Efficient Online Memory for Large Language Models Compact online associative memory coupled directly to transformer attention. (arXiv)
-
LoCoMo Benchmark Long-term conversational memory benchmark used to evaluate memory-heavy reasoning systems.
-
MemoryAgentBench Benchmark for evaluating long-term memory and reasoning in autonomous agents.
Retrieval, Context, and Cognitive Search
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks The foundational RAG paper. Essential reading for understanding retrieval-based memory systems.
-
MemGPT: Towards LLMs as Operating Systems Explores hierarchical memory management and paging strategies for long-running agents.
-
Generative Agents: Interactive Simulacra of Human Behavior Introduces memory streams, reflection, and planning mechanisms for persistent AI agents.
Cognitive Architectures & Memory Systems
-
ACT-R Cognitive Architecture One of the classic cognitive architectures modeling human memory and reasoning.
-
Soar Cognitive Architecture Long-running symbolic cognitive architecture research focused on learning and memory.
-
Hierarchical Temporal Memory (HTM) Biological inspiration for sequence memory and temporal prediction systems.
Runtime Adaptation & Self-Tuning Systems
-
LoRA: Low-Rank Adaptation of Large Language Models Important background for understanding low-rank runtime adaptation approaches referenced indirectly by δ-mem.
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines Interesting work around structured, self-improving AI pipelines.
-
Reflexion: Language Agents with Verbal Reinforcement Learning Explores feedback-driven behavioral adaptation in autonomous agents.