MEMWRIGHT — A MEMORY JOURNAL FOR AGENTIC SYSTEMS VOL. 02 · REV. 0.1 EST. 2026 · NEW YORK MIT

Your agents don’t forget. They never knew. Today that changes.

The memory layer for agent teams. Self‑hosted. Deterministic retrieval. No LLM in the critical path.

One tier your whole pipeline shares — planner writes, executor reads, reviewer sees both. Namespaces, RBAC, provenance, temporal correctness, ranked retrieval, token budgets — built in, not bolted on. Python library, REST API, or containerized service. No SaaS middleman. No per‑seat fees. No vendor lock‑in. It’s yours.

poetry add memwright
Architecture & deploy guide → GitHub stars PyPI monthly downloads ← live from pypi
MIT · Python 3.10–3.14 · Production deploy in one command
MULTI-AGENTNAMESPACE ACCESSRBAC TRACEPROVENANCE CONTEXTTOKEN BUDGETS RETRIEVAL LAYERS5 PYTHON3.10 — 3.14 STACKSQLite · ChromaDB · NetworkX CLOUDPostgreSQL · ArangoDB · Cosmos · AlloyDB DEPLOYAWS · Azure · GCP RETRIEVALRRF · PageRank · MMR LICENSEMIT MULTI-AGENTNAMESPACE ACCESSRBAC TRACEPROVENANCE CONTEXTTOKEN BUDGETS RETRIEVAL LAYERS5 PYTHON3.10 — 3.14 STACKSQLite · ChromaDB · NetworkX CLOUDPostgreSQL · ArangoDB · Cosmos · AlloyDB DEPLOYAWS · Azure · GCP RETRIEVALRRF · PageRank · MMR LICENSEMIT
Six stars out of eight hundred light up — Memwright recalls exactly what this moment needs
Memwright doesn’t search. It remembers.

Eight hundred memories in the store. Six light up. Not the six it searched — the six this moment needs. Your planner’s decision from Monday. The researcher’s finding from Wednesday. The reviewer’s objection from yesterday. Ranked, deduped, fit to budget. Zero LLM calls in the critical path.

§ 00.1 — The solution, in one picture Memory accumulates · load primes

Memory accumulates. Load primes.

Four writes across Mon–Thu land as persisted memories. Friday morning a fresh Portfolio Planner wakes up to a new task, calls mem.load_context(), and Memwright ranks + dedupes + budget-fits all four back into the context window. The agent resumes with full continuity — earnings signal, risk cap, prior stance, compliance precedent.

Write · Write · Write · Write ... Read — memories accumulate across sessions, load_context primes the next task
Fig. 0.1 — Not RAG over documents. The agents’ own history, replayed into a fresh context.
§ 01 — The Problem Why agent prototypes don't survive production

Every agent starts the day with amnesia.

Single agents rediscover the same facts every run. Multi-agent pipelines are worse — the planner’s decisions never reach the executor, the researcher’s findings never reach the reviewer. So teams do the only thing they can: stuff giant prompts between agents. Burn tokens. Hope nothing important fell off the edge. That’s not an architecture. That’s a workaround.

We had a planner, a coder, a reviewer, a deployer — four agents in a pipeline. None of them knew what the others learned. We were passing giant prompts between them and burning tokens on stale information. Overheard · Engineering Lead, Fortune 100 Bank
Without Memwright
01Each agent starts blind — no knowledge of what others learned
02Giant prompts passed between agents burn context tokens
03No access control — any agent can overwrite any state
04Contradicting facts from different agents go undetected
05Session ends, everything learned is gone forever
With Memwright
01Shared memory — planner writes, coder reads, reviewer sees both
02Token-budget recall — each agent pulls only what fits
03Six RBAC roles, namespace isolation, write quotas per agent
04Contradictions auto-resolved — newer facts supersede older ones
05Persistent across sessions, pipelines, and agent restarts

Token cost per agent as memories grow

Fig. 01.1 · Lower is better
Month 1
2K
2K
Month 3
8K
2K
Month 6
15K
2K
Prompt-passing Memwright

More agents, more sessions, more memories — retrieval gets better while context cost stays flat.

§ 02 — Built for teams, not chatbots Orchestrator · Planner · Executor · Reviewer

Not a chatbot plugin. Infrastructure for agent teams.

Every recall and write is scoped to an AgentContext — identity, role, namespace, parent trail, token budget, write quota, visibility. Contexts are immutable. Spawning a sub-agent returns a new context with inherited provenance. Planner writes. Executor reads. Reviewer sees both. Every call authorised before it touches storage.

Memwright high-level architecture
Fig. 1 — Agents talk to Memwright through Python, REST, or MCP. Storage is pluggable.
01 / 06

Namespace isolation

Every agent, project, or tenant gets its own namespace. Planner writes, coder reads, reviewer sees both. Isolated by default, shared when you configure it.

02 / 06

Six RBAC roles

Orchestrator, Planner, Executor, Researcher, Reviewer, Monitor. Read-only observers to full admins. Control who can read, write, or supersede in each namespace.

03 / 06

Provenance tracking

Know which agent wrote which memory, when, and under which parent session. The reviewer can trace a decision back to the planner three sessions ago — not some unknown source.

04 / 06

Cross-agent contradiction resolution

Agent A learns "user works at Google." Agent B learns "user works at Meta." Memwright auto-supersedes. Full history preserved. Zero inference calls in the critical path.

05 / 06

Token budgets per agent

recall(query, budget=2000) — a lightweight summarizer uses 500 tokens; a deep reasoner uses 5,000. Each agent receives exactly what fits in its context window.

06 / 06

Write quotas & review flags

Prevent a runaway agent from flooding the store with noise. Rate-limit writes per namespace, flag writes for human review, add compliance tags for audit.

§ 03 — The Retrieval Pipeline Five layers · zero inference calls

Five layers. No LLM. Fully deterministic.

When an agent calls recall(query, budget), five cooperating layers find, fuse, score, and fit the most relevant memories into the requested token ceiling. Ten million memories in the store. Your context window never sees more than the budget. And because there’s no LLM in the path, the same query always returns the same ranking. You can unit‑test it.

Five-layer retrieval pipeline — tag match, graph expansion, vector search, fusion, diversity+fit
Fig. 3 — Same pipeline. Ten memories or ten million. The budget holds.
01

Tag Match

Stop-word filtered tag extraction against SQLite's tag index. Exact and partial hits. Fast. Deterministic.

SQLiteTag index · FTS
02

Graph Expansion

Multi-hop BFS on the entity graph. Query "Python" discovers "FastAPI," "Django," and "pip" through relationship edges.

NetworkX / AGEBFS · depth 2
03

Vector Search

Semantic similarity for everything tag and graph miss. Cloud-native embeddings when available; local sentence-transformers otherwise.

ChromaDB / pgvectorCosine similarity
04

RRF Fusion + PageRank

Reciprocal Rank Fusion blends results from every layer; PageRank boosts memories about well-connected entities; confidence decay favors recent writes.

FusionRRF · k=60
05

MMR Diversity + Budget Fit

Maximal Marginal Relevance eliminates near-duplicates. Greedy selection packs the top-scoring memories into the caller's token budget. The rest never enter context.

MMRλ = 0.7

Storage roles

Every memory is persisted across three complementary stores. Every supported backend combo is just a different technology choice for one or more of these roles.

Storage roles — document store, vector store, graph store
ROLE · 01
Document store

Source of truth. Content, tags, entity, category, timestamps, provenance, confidence. Where add() commits and recall() hydrates final text.

ROLE · 02
Vector store

Dense embedding per memory, keyed by memory ID. Finds memories by meaning when no tag or word overlaps the query.

ROLE · 03
Graph store

Entity nodes + typed edges (uses, authored-by, supersedes). Query “Python” can surface “Django” via the graph.

Ingestion flow — what happens on add()

                      mem.add(content, tags, entity, ...)
                                    │
              ┌────────────────────┼─────────────────────┐
              ▼                     ▼                     ▼
      ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
      │ Document      │     │ Vector        │     │ Graph         │
      │ store         │     │ store         │     │ store         │
      ├───────────────┤     ├───────────────┤     ├───────────────┤
      │ insert row    │     │ embed(text)   │     │ extract       │
      │  content,     │     │  → 384-d vec  │     │  entities +   │
      │  tags, meta,  │     │ insert keyed  │     │  relations    │
      │  provenance   │     │  by memory ID │     │ upsert nodes, │
      │               │     │               │     │ add edges     │
      └───────────────┘     └───────────────┘     └───────────────┘
              │                     │                     │
              └─────────────────────┼─────────────────────┘
                                    ▼
                     Contradiction check per entity
                    (older conflicting facts → superseded,
                         kept in timeline)
                                    │
                                    ▼
                                  done

The three writes commit as one logical transaction. On SQL backends it’s a real DB transaction; on distributed backends it’s sequenced with best-effort rollback.

Recall flow — what happens on recall()

                   mem.recall(query, budget=2000)
                                │
          ┌─────────────────────┼─────────────────────┐
          │                     │                     │
          ▼                     ▼                     ▼
   ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
   │ Tag matcher   │     │ Graph expander│     │ Vector search │
   │  → doc store  │     │  → graph store│     │  → vec store  │
   │  FTS, tags,   │     │  BFS depth 2  │     │  cosine ANN   │
   │  stop-filter  │     │  from entities│     │  top-K IDs    │
   └───────┬───────┘     └───────┬───────┘     └───────┬───────┘
           │                     │                     │
           └─────────────────────┼─────────────────────┘
                                 ▼
                     Candidate memory IDs (~100s)
                                 │
                                 ▼
                    Hydrate IDs ← document store
                                 │
                                 ▼
              Fusion + Rank  (RRF k=60 · PageRank · confidence decay)
                                 │
                                 ▼
              Diversity + Fit  (MMR λ=0.7 · greedy token-budget pack)
                                 │
                                 ▼
                   Ranked memories ≤ budget tokens
                          (zero LLM calls)

Only memory IDs travel between layers until the hydrate step. A store with ten million rows still returns a tight result set inside the caller’s token ceiling.

And three things most of the industry skipped.

Isolation. Temporal correctness. Provenance. Not features we bolted on. Primitives we started from.

PILLAR · 01

Isolation

Every memory lives inside a namespace. Every agent is bound to one of six roles. Every call is authorised before it touches storage. Enforced at the row level, not in application code.

ORCHESTRATOR · spawns sub-agents
PLANNER · writes decisions
EXECUTOR · runs the work
RESEARCHER · gathers facts
REVIEWER · audits decisions
MONITOR · observability only
PILLAR · 02

Temporal

Memwright doesn’t overwrite. It supersedes. Every fact has a validity window. The timeline is replayable to any point in the past.

v1 JPM CFO is Jeremy Barnum
   valid_from: 2022-05
   ↓ superseded
v2 JPM CFO is Jane Doe
   valid_from: 2026-04-11
   ↓ superseded
v3 Appointment delayed
   valid_from: 2026-04-12

Ask recall(as_of=…) to replay the past. Nothing is deleted. Auditors can reconstruct what the desk knew and when.

PILLAR · 03

Provenance

Every sentence the agent writes back is traceable to its source. A cryptographic chain from raw feed ingest to grounded answer.

raw signal ingest
  source_id · sha256 · ts

memory row
  id · content · provenance

5-layer recall

agent answer [mem_42]

The citation isn’t a string the LLM chose. It’s the memory_id carried through the pipeline. Click it, land on the raw wire, verify the hash. Grounded, not plausible.

§ 04 — Deployment Matrix Your cloud · your infrastructure · Terraform included

Same API. Every backend. Your infrastructure.

One Python library. One Starlette ASGI container. Six deployment targets — laptop, dev VM, AWS, Azure, GCP, on‑prem. The three storage roles swap per column. Your agent code never learns which backend it’s talking to. Terraform templates live under agent_memory/infra/. Clone. Set your variables. terraform apply.

Run it on your laptop. No cloud, no cost.

Self-host locally in one command.

$ pip install memwright
$ memwright api --host 0.0.0.0 --port 8080

Starlette ASGI on http://localhost:8080. SQLite + ChromaDB + NetworkX provision automatically under ~/.memwright. Point every agent in your stack at the same URL — they share memory instantly. No Docker. No API keys. Air-gap it behind your firewall and walk away.

Cloud · 01

AWS

App Runner with Starlette ASGI. Auto-scaling, HTTPS, custom domains.

2 CPU · 4 GB · us-west-2
Cloud · 02

Azure

Container Apps with Cosmos DB DiskANN. Scale-to-zero. Same API, same results.

2 CPU · 4 GB · eastus
Cloud · 03

GCP

Cloud Run with AlloyDB. Scale-to-zero. Google's managed infrastructure.

2 CPU · 4 GB · us-central1
Backend · 04

PostgreSQL

pgvector + Apache AGE. Neon serverless or any Postgres 16.

Doc · Vector · Graph
Backend · 05

ArangoDB

Multi-model: graph + document + vector in one engine. Oasis or self-hosted.

Native graph traversal
Backend · 06

Local / On-Prem

SQLite + ChromaDB + NetworkX. Air-gapped deployments. Full data sovereignty.

No network egress

Same container. Pluggable stores.

One Memwright image. Six targets. The three storage roles swap per column. That’s the whole trick.

Deployment matrix — one container, six targets, stores swap per column
Fig. 4 — DocumentStore, VectorStore, GraphStore — three interfaces. Every column is one triple of implementations.
LANE · 01
Laptop
pip install
Doc: SQLite
Vec: ChromaDB
Graph: NetworkX
LANE · 02
Self-host
Docker · any cloud
Doc: Postgres 16
Vec: pgvector
Graph: Apache AGE
LANE · 03
AWS
App Runner
All 3 roles:
ArangoDB Oasis
one engine
LANE · 04
GCP
Cloud Run
All 3 roles:
AlloyDB +
pgvector + AGE
LANE · 05
Azure
Container Apps
All 3 roles:
Cosmos DB
DiskANN

Every deployment is the same Python library wrapped in the same Starlette ASGI container. DocumentStore, VectorStore, and GraphStore are three interfaces; each column above is one implementation triple.

Promotion path — laptop to cloud, no rewrite.

Prototype on a laptop. Promote to Docker Compose on a dev VM. Promote to a managed container runtime. Same API throughout. Only the storage URLs change.

Promotion path — laptop to dev VM to managed cloud, same API rail throughout
Fig. 5 — The same rail runs under every stop. The code never learns which cloud it’s on.

Three shapes. Pick by blast radius.

Same engine. Same storage. Same retrieval. Different coupling. Pick by latency budget and how many agents share the tier.

MODE A

Embedded library

AgentMemory(’./store’) in‑process. Sub‑millisecond. Right for a single agent prototyping on a laptop.

Latency · lowest
MODE B

Sidecar container

memwright api on localhost:8080. Any language. Go or Rust agents call HTTP without Python in the image.

Isolation · process
MODE C

Shared service

One Memwright service in front of an agent mesh. App Runner · Cloud Run · Container Apps. Managed storage behind. This is the production shape.

Scale · multi‑agent mesh

Code path is identical across all three. Only configuration changes.

— And one more thing —

For Claude Code, it’s three words.

Not a config file. Not an MCP setup guide. Three words in your terminal. Memwright interviews you, installs itself, wires the hooks, and runs a health check before you’ve put the kettle on.

install agent memory

Paste into Claude Code · global or project scope · auto‑merges MCP config

§ 05 — Or do it yourself, in three lines Python library · REST API · MCP protocol

Three lines of Python. That’s the integration.

Same call surface whether Memwright runs in‑process on your laptop or behind a Cloud Run service. Pick the interface. Ship.

from agent_memory import AgentMemory from agent_memory.context import AgentContext, AgentRole # Orchestrator context — root of a multi-agent pipeline ctx = AgentContext.from_env( agent_id="orchestrator", namespace="project:acme", role=AgentRole.ORCHESTRATOR, token_budget=20000, ) # Spawn sub-agents with inherited provenance planner = ctx.as_agent("planner", role=AgentRole.PLANNER) planner.add_memory("Use event sourcing for the order service", category="technical", entity="order-service") # Executor reads what planner wrote — same namespace, ranked recall executor = ctx.as_agent("executor", role=AgentRole.EXECUTOR) results = executor.recall("order service architecture", budget=2000)
# Run as a REST API (Starlette ASGI + Uvicorn) $ memwright api --host 0.0.0.0 --port 8080 # Same container ships to AWS App Runner, GCP Cloud Run, # or Azure Container Apps. Terraform templates in agent_memory/infra/. # Eight endpoints, Terraform included: # POST /add /recall /search # POST /timeline /forget GET /stats # GET /memory/:id /health # Envelope: {"ok": true, "data": {...}}
# Any MCP-compatible client (Claude Code, Cursor, # Windsurf, or a custom agent speaking stdio MCP) $ poetry add memwright # Add to your client's MCP config: { "mcpServers": { "memory": { "command": "memwright", "args": ["mcp"] } } } # Agents get eight tools: # memory_add memory_recall memory_search # memory_get memory_forget memory_timeline # memory_stats memory_health
§ 06 — Design Principles Opinionated · on purpose

Five stakes in the ground.

What we won’t compromise on — no matter how loud the pressure gets.

Your data stays in your infrastructure. Self‑hosted, always.

Retrieval is deterministic. No LLM judges. No hidden inference in the path.

One recall(). Every backend. Swap without rewriting.

Agent teams are first‑class. Not a bolt‑on.

Boring where it counts. Proven, debuggable, no magic.

§ ∞ — Available today MIT · PyPI · MCP Registry

Point every agent at one URL.
They share memory instantly.

pip install memwright
GitHub → PyPI MCP Registry