How agentmemory works, and how to actually use it to with your agent

After 3 minutes, you’ll know whether to install agentmemory. After 10 minutes, you’ll have it running and be able to use it immediately.

TLDR? Review this HTML interactive guide (beta), inspired by @trq212

agentmemory replaces CLAUDE.md, .cursorrules, and every other static-file memory hack with an actual local service.

3,000+ stars in ~3 days, version 0.9.9 shipped 2026-05-11, Apache-2.0, one npx command to install.

Rohit Ghumare, Principal Product Evangelist at iii.dev, authored the implementation. The design comes from his gist “LLM Wiki v2” which extends Karpathy’s LLM Wiki pattern with the lessons from building agentmemory (170+ forks, 13 comments).

What changes for you. Session 1, you set up JWT auth with jose middleware in src/middleware/auth.ts. Session 2, you ask for rate limiting, and the agent already knows your auth stack, your test file, and why you chose jose over jsonwebtoken. No re-explaining.

You can now give Hermes, Claude Code, and Codex infinite memory. For free. Agentmemory is trending on GitHub with 4,000+ Stars. It records what Claude does during your coding sessions. Compresses it with AI. Injects relevant context back into future sessions. CLAUDE md dumps

10:54 PM · May 10, 2026 · 92.8K Views

49 Replies · 168 Reposts · 1.34K Likes

Every AI coding agent ships with built-in memory: Claude Code has MEMORY.md, Cursor has notepads, Cline has a memory bank. These work like sticky notes. They cap at ~200 lines, go stale, and load everything into context every time the session opens.

agentmemory is the searchable database behind the sticky notes. The repo was created 2026-02-25, has 280+ commits across 13 contributors, and shipped 8 releases in three days (May 9 to May 11) leading up to v0.9.9.

Jump to “How to get started” down below.

agentmemory is built on iii-engine, a service composition framework (15K+ stars, TypeScript + Rust). Functions, KV state, streams, and OTEL traces are all iii primitives. The engine replaces Express.js, Postgres + pgvector, SSE/Socket.io, pm2, and Prometheus. No external database, no Docker required (though both work).

The system has three layers: Capture, Pipeline, Retrieval. Plus a consolidation cycle that compresses raw observations into longer-term memory tiers.

Capture. Twelve Claude Code lifecycle hooks fire automatically: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PostToolUseFailure, PreCompact, SubagentStart, SubagentStop, Notification, TaskCompleted, Stop, SessionEnd. Each hook is a standalone Node script that reads JSON from stdin, POSTs to the local REST API, and exits. Non-Claude agents capture the same way through /agentmemory/observe or the MCP memory_save tool. Zero manual add() calls. The agent works, the hook fires, the observation lands.

Pipeline. Inside the server, every observation passes through four stages. SHA-256 dedup catches anything that repeats within five minutes. The privacy filter in src/functions/privacy.ts strips <private> blocks and redacts API keys, bearer tokens, GitHub tokens, AWS keys, Google keys, JWTs, npm tokens, GitLab tokens, and DigitalOcean tokens before storage.
Raw observations land in iii-engine’s file-backed KV. Then a synthetic compression path indexes the observation in BM25 without calling any LLM. If you set AGENTMEMORY_AUTO_COMPRESS=true, a configured Anthropic, MiniMax, Gemini, or OpenRouter provider compresses observations into structured facts on every hook. Off by default since v0.8.8 (issue #138) because the per-token cost on active sessions is significant.
Retrieval. Three streams run in parallel inside src/state/hybrid-search.ts. BM25 stems with Porter, expands synonyms, runs always. Vector computes cosine similarity over all-MiniLM-L6-v2 384-dimension embeddings when an embedding provider is configured (free local option via @xenova/transformers, or hosted via Gemini, OpenAI, Voyage, Cohere, OpenRouter).
Graph traverses entity relationships when entities are detected in the query. Results fuse via Reciprocal Rank Fusion with RRF_K = 60, then diversify across sessions (max 3 results per session) so one session’s noise can’t dominate the top-K.
4-tier consolidation. Memories progress through four tiers, analogous to sleep consolidation. Working holds raw observations. Episodic holds compressed session summaries. Semantic holds extracted facts and patterns. Procedural holds workflows and decision patterns. Memories decay on an Ebbinghaus-style curve. Frequently accessed memories strengthen. Stale memories auto-evict. Contradictions are detected and resolved on write.
Context assembly. When a new session starts, mem::context assembles pinned slots, project profile, recent session summaries, and high-importance observations into an <agentmemory-context> block with a default 2,000-token budget. SessionStart context injection is OFF by default (AGENTMEMORY_INJECT_CONTEXT=false since v0.8.10, issue #143).
Hooks still POST observations for background capture either way, but the README’s “agent already knows your stack” demo specifically requires the env var to be on. Worth knowing before you wonder why the agent doesn’t seem to remember anything.
MCP and REST surface. Fifty-one memory_* tools, six MCP resources, three prompts, four skills (/recall, /remember, /session-history, /forget), and 127 REST endpoint declarations across src/triggers/api.ts and src/mcp/server.ts. Seven tools are visible by default. Set AGENTMEMORY_TOOLS=all to expose all 51. The full list includes hybrid search (memory_smart_search), file history (memory_file_history), project profile (memory_profile), graph traversal (memory_graph_query), team sharing (memory_team_share), audit trail (memory_audit), and multi-agent coordination primitives (memory_lease, memory_signal_send, memory_mesh_sync).

This section covers install. For the command reference once it is running, jump to How to Use It at the end of this article.

Total time to working install: under 10 minutes on macOS or Linux, 15 on Windows.

1. Prerequisites. Node.js >= 20.0.0. iii-engine v0.11.2 OR Docker Desktop. macOS, Linux, or Windows. On Windows, the npm package alone is not enough: download the iii-engine binary from iii-hq/iii releases (or use Docker Desktop), since there’s no PowerShell installer or winget package today.

2. Start the server.

This auto-downloads or starts iii-engine v0.11.2 (the pinned version, since v0.11.6 introduced a sandbox-per-worker model agentmemory hasn’t refactored for yet). REST binds to 127.0.0.1:3111, streams to :3112, viewer to :3113.

3. Seed and verify with the demo.

This seeds three sessions (JWT auth setup, an N+1 query fix, rate limiting) with six observations total, then runs three searches. The search “database performance optimization” returns the N+1 fix observation. Keyword-only search cannot do that. If you see results, the pipeline works end-to-end.

4. Open the viewer.

Live observation stream, session explorer, memory browser, knowledge-graph visualization, and a health dashboard. Bound to 127.0.0.1 only.

5. Wire it to your agent. One JSON block covers most hosts (Cursor, Claude Desktop, Cline, Roo Code, Windsurf, Gemini CLI, OpenClaw). Merge into the host’s existing mcpServers object. Do not replace the file.

The host-specific shapes:

Cursor: ~/.cursor/mcp.json
Claude Desktop: claude_desktop_config.json in Application Support, restart after editing
Cline / Roo Code / Kilo Code: Settings UI, MCP Servers, Edit
Windsurf: ~/.codeium/windsurf/mcp_config.json
Gemini CLI: gemini mcp add agentmemory npx -y @agentmemory/mcp --scope user
Codex CLI (TOML shape): codex mcp add agentmemory -- npx -y @agentmemory/mcp
OpenCode (different shape, top-level mcp key with command as array)

6. Claude Code: install the plugin instead. Skip step 5 and run:

The plugin registers all 12 hooks, 4 skills, and auto-wires the MCP server through .mcp.json. Verify with curl http://localhost:3111/agentmemory/health.

7. Optional: free local embeddings.

Switches the embedding provider to all-MiniLM-L6-v2 running in-process. No API key, no per-call cost, adds ~9 percentage points of recall over BM25-only (86.2% to 95.2% R@5 on LongMemEval-S).

8. Optional: turn on the headline demo. Out of the box, agentmemory captures and supports recall but does not inject context into the agent’s first turn. To enable the “agent already knows your stack” behavior, add this to ~/.agentmemory/.env:

Restart the server. SessionStart will now write up to 2,000 tokens of relevant project context into the first turn. This counts against your model’s token budget. The startup warning will remind you.

9. Import existing transcripts.

Default scan path is ~/.claude/projects. Default cap is 200 files / 1,000 sessions. Imported sessions show up in the viewer’s Replay tab alongside native ones.

That covers install. For the command reference once it is running, jump to How to Use It at the end of this article.

LongMemEval-S (ICLR 2025, 500 questions, ~48 sessions per question, ~115K tokens each):These are retrieval recall scores, not end-to-end QA accuracy. The repo says so plainly: it does not claim these as “LongMemEval scores,” only as retrieval-only evaluations on the LongMemEval-S haystack. Scripts and the cleaned dataset are committed under benchmark/ for reproduction.

Token efficiency, repo-reported: ~1,900 tokens per session, ~170K per year, ~$10/year on per-token billing or $0/year with local embeddings. Compare against ~22K tokens at 240 observations for the “paste everything into CLAUDE.md” approach. That is a 92% reduction at the working point most heavy users hit by month two.

Mem0 and Letta publish benchmarks on LoCoMo, a different evaluation set. The repo’s own COMPARISON.md flags this as “apples vs oranges” and invites cross-benchmark collaboration. Treat the headline 95.2% as agentmemory’s number on agentmemory’s pipeline, not a leaderboard win against tools measured on a different test.

Plaintext HTTP token transport (issue #275, open). The plugin sends the AGENTMEMORY_SECRET bearer over plaintext HTTP. The default localhost binding contains the exposure today. Anyone exposing the REST surface beyond a single host needs a reverse proxy, TLS, and an auth review first.

The interesting features are off by default. Auto-compress, context injection, slots, reflect, and graph extraction are all gated behind env vars in src/config.ts. The README’s “agent already knows your stack” demo specifically requires AGENTMEMORY_INJECT_CONTEXT=true. Out of the box, agentmemory captures observations and supports recall, but it does not inject context into Claude Code’s first turn. This is the single biggest expectation mismatch a new user will hit.

Documentation drift. AGENTS.md ships v0.8.9 stats (44 tools, 104 endpoints, 699 tests). The source has 51 tools, 127 endpoint declarations, and 888 static test cases. The README badge claims 104 endpoints and 827 tests, while the README prose says 107 endpoints. benchmark/COMPARISON.md references npm run bench:* scripts that are not in package.json. Three doc surfaces, three different numbers, no agreement.

The engine writes state into your project root (issue #303, open). When Claude Code auto-starts agentmemory from a project directory, the iii-engine creates data/state_store.db/... inside the user’s git working tree. Expect cleanup noise and .gitignore drift until the engine moves state to a stable location.

Verdict: Production Ready for solo developers and small teams running agentmemory on localhost as a personal coding-agent memory layer.

The architecture is real. The benchmark is reproducible from committed scripts. The install is one npx command. The 12-hook capture flow runs unattended. The viewer at port 3113 makes the memory system inspectable, which is rare in this category. There is no equivalent shipping today.

Maintenance health is acceptable. Eight releases in three days (May 9 through May 11). 280+ commits since February. Active issue triage, public CHANGELOG, public ROADMAP. The 91% single-maintainer concentration is the asterisk, and the Q3 2026 roadmap names “additional maintainer onboarding” as a priority and a foundation Growth-Stage prerequisite.

What would change the verdict positively? Q4 2026 ships an external security audit, SSO, RBAC, and audit-log export. Q1 2027 freezes the REST and MCP surface for v1.0. The plaintext-HTTP fix (#275) and the engine-CWD cleanup (#303) close. At that point the answer for production deployments changes from “wait” to “deploy.” Watch for v1.0 in Q1 2027.

Benefits: Claude Code, Cursor, Codex CLI, Gemini CLI, and OpenCode users running solo or in small teams. Engineers in multi-day agent sessions on the same codebase. Teams who want a real-time viewer for what the agent is learning. Anyone who hit the 200-line CLAUDE.md ceiling and started copy-pasting.

Doesn’t benefit yet: teams deploying agentmemory’s REST surface beyond localhost without a reverse proxy plus TLS (issue #275 is open). Production deployments that require an external security audit (planned Q4 2026). Non-English coding sessions that rely on accurate BM25 retrieval (issue #295 strips non-ASCII tokens today). Windows users without Docker or the iii-engine binary on PATH.

Coding-agent users can now install a shared, hook-driven, searchable memory layer in one npx command and stop pasting their stack into every new session.

Follow @AlphaSignalAI for more content like this.

Subscribe at AlphaSignal.ai for daily AI signals. Read by 280,000+ developers.

A command reference for once agentmemory is installed and running.

The four user-invocable skills (Claude Code). Installed by the plugin, invoked by typing the slash command in the agent:

/recall [query] wraps memory_smart_search. Hybrid BM25 + vector + graph search across past observations. Use when you want context from a past session (”recall how we set up JWT auth”).
/remember [content] wraps memory_save. Explicitly persists an insight, decision, or pattern with auto-extracted concepts and file references. Use when you want to lock in a decision so the next session inherits it.
/session-history wraps memory_sessions. Lists the last 20 sessions on this project with key highlights.
/forget [query or session ID] wraps memory_smart_search then memory_governance_delete. Surfaces matching observations, asks for explicit confirmation, then deletes with an audit trail.

Most of the time, just talk to the agent. The 12 hooks capture every tool call automatically. If AGENTMEMORY_INJECT_CONTEXT=true is set, SessionStart pre-loads relevant memories into the agent’s first turn. The four skills above are for explicit control, not the default workflow.

Direct MCP tools (other agents). Agents without the plugin call MCP tools directly. The seven core tools available by default:

Set AGENTMEMORY_TOOLS=all to expose the full 51-tool surface (knowledge graph queries, multi-agent leases, signals, sentinels, sketches, consolidation, snapshots, mesh sync, audit, governance, team sharing).

Direct REST calls. Every MCP tool has a REST equivalent on port 3111. Useful for scripts, IDEs, and CI:

When AGENTMEMORY_SECRET is set, protected endpoints require Authorization: Bearer <secret>.

The viewer at port 3113. Open http://localhost:3113 to watch observations land live, browse sessions, walk the knowledge graph, and scrub through the Replay tab on past sessions (including imported JSONL transcripts).

When to reach for which command:

Working on a feature, want continuity across sessions: do nothing, hooks handle it
Decision or pattern worth keeping forever: /remember
New session, no context loaded: /recall with a topic
Reviewing yesterday’s work: /session-history
Privacy or cleanup: /forget