Dhee — the context manager for AI coding agents

Dhee decides what your coding agent should see, remember, and forget each turn, so it stays cheap, reliable, and auditable.

It runs locally under Claude Code, Codex, Cursor, Gemini CLI, Aider, Cline, Hermes, and any MCP client.

#1 on LongMemEval retrieval — R@1 94.8% · R@5 99.4% · R@10 99.8% on the full 500-question set. Reproduce it →

What is Dhee · Current State · Team Knowledge · File Interface · Quick Start · Repo-Shared Context · Benchmarks · How It Works · vs Alternatives · Integrations

What is Dhee?

Dhee decides what your coding agent should see, remember, and forget each turn, so it stays cheap, reliable, and auditable.

Every serious coding agent now hits the same bottleneck: not model intelligence, but context. Transcripts grow, tool output piles up, compaction drops decisions, and useful project knowledge gets trapped in one session.

Dhee runs locally between your agent and your workspace. It keeps the agent focused on the goal, decisions, files, tests, and evidence that matter now, while preserving the raw history for audit and reuse.

The buyer problem is simple:

Agents waste money re-reading files, logs, and old conversation.
Agents lose decisions after compaction, handoff, or tool-output overload.
Teams cannot reuse what one agent learned in another agent without copying a pile of text.

Dhee handles the context layer:

Keeps current state. Goal, facts, decisions, active files, tests, and next step stay visible without replaying the whole session.
Shrinks noisy tool output. Large reads, searches, logs, and test runs become compact digests with pointers back to the raw evidence.
Reuses team knowledge safely. Decisions, docs, handoffs, and promoted learnings move across agents with provenance instead of becoming prompt sludge.

Who it's for

AI-native engineering teams whose agents are expensive, forgetful, repetitive, or hard to audit.
Claude Code / Cursor / Codex / Gemini CLI / Aider / Cline users who have hit context limits, compaction loops, or runaway tool-output bills.
Teams standardizing on AGENTS.md, CLAUDE.md, Skills, MCP tools, and subagents who need governed delivery instead of bigger prompts.
Hermes users who already have a self-evolving agent and want those learnings to make Claude Code and Codex smarter too.
Founders building agentic development workflows who need a local, inspectable context layer before they can trust agents with more of the work.

Current State — keep the agent oriented

Long coding sessions get expensive and less reliable when old tool output, repeated reads, failed attempts, and superseded plans keep influencing the next token. Dhee's answer is not to trim the transcript. Dhee keeps a canonical working state and regenerates a small state card for each turn.

dhee context status
dhee context state --card
dhee context provision "fix expired-token KeyError"
dhee context checkpoint --reason "before compaction"
dhee context rollover --reason "context debt crossed threshold"

The state card contains only current signal:

<dhee_state v="1" epoch="3" revision="42" debt="healthy">
  <goal>Fix expired-token KeyError in login</goal>
  <facts><f src="pytest">middleware.py line 47 raises KeyError iat</f></facts>
  <decisions><d id="D-...">Use python-jose validation path</d></decisions>
  <next>Patch middleware and run the narrow auth test.</next>
  <files><file>middleware.py</file></files>
  <evidence><ptr ptr="R-...">failing pytest digest</ptr></evidence>
</dhee_state>

Task pivots start a new epoch: stale facts, repeated reads, old plans, and superseded decisions are tombstoned instead of carried into the next state card. The raw evidence remains local behind pointers, and state writes are guarded so CLI, MCP, Codex sync, and Claude hooks do not trample each other.

Quality is the gate. Dhee suppresses duplicate and stale context only when the pointer store, expansion SLO, and outcome signals keep the next step safe. If expansion rises, Dhee deepens that digest class instead of hiding more evidence.

Team Knowledge — reuse what agents learn

Hermes can evolve its own skills and memories. Claude Code has native hooks. Codex has MCP config, AGENTS.md, and a persisted session stream. Dhee turns those separate agent histories into reusable context that other agents can trust.

Hermes MemoryProvider
  ├─ MEMORY.md / USER.md writes
  ├─ agent-created skills
  ├─ session summaries and outcomes
  └─ self-evolution traces
          │
          ▼
      Dhee Review Layer
          │
          ├─ candidate  -> review / evidence / score
          ├─ promoted   -> injected as Learned Playbooks
          └─ rejected   -> auditable, never injected
          │
          ▼
Claude Code · Codex · Hermes · any MCP client

What this means in practice:

Your existing Hermes progress is not stranded inside Hermes. dhee install detects Hermes when present, installs Dhee as a Hermes MemoryProvider at ~/.hermes/plugins/memory/dhee, and imports local Hermes memory files, session summaries, and agent-created skills into Dhee.
Claude Code and Codex do not need to launch Hermes to benefit. They receive promoted Hermes/Dhee learnings through normal Dhee context and MCP tools.
New Claude Code and Codex outcomes can become Dhee learning candidates too. After promotion, Hermes can read them back through the same provider.
Candidate learnings are never auto-injected. Trusted Hermes MEMORY.md / USER.md imports may be promoted during install; Hermes SOUL.md, session traces, and agent-created skills stay candidates until explicitly approved or promoted by policy.

This is the product contract: with Dhee, a learning proven in one agent can become a promoted playbook for every connected agent.

Reality check

Hermes native: Dhee integrates as a Hermes MemoryProvider, the first-class Hermes memory-plugin surface. Hermes allows one active external memory provider, so V1 replaces Honcho/Mem0/etc. while memory.provider: dhee is active.
Claude Code native: Dhee uses Claude Code hooks, MCP, and router enforcement. This is the strongest integration surface.
Codex native: Codex does not expose Claude-style pre-tool hooks here. Dhee uses the closest native Codex surfaces: ~/.codex/config.toml, global ~/.codex/AGENTS.md, MCP server instructions, and Codex session-stream auto-sync.
Promotion gate: Imported Hermes skills and session traces are candidates by default. Rejected or archived learnings remain auditable but are excluded from retrieval.
Continuity hygiene: Handoffs filter fixture memories, artifact chunks, and placeholder test rows by default. Shared tool results carry provenance, salience, TTL, and evidence pointers so another agent can inherit the useful state without inheriting every live mirror.

File Interface — inspect agent context like local files

Agents already understand files and shell verbs. Dhee exposes memory, handoff, artifacts, shared tasks, and learning review as one virtual context space:

dhee shell "ls /learnings"
dhee shell "cat /handoff/latest.md"
dhee shell "grep parser /learnings/promoted"
dhee shell "cat /router/ptr/R-abc123"

The first version is a virtual shell, not FUSE. It intentionally supports a small approved command set: ls, cat, grep, why, promote, reject, broadcast, provision, and snapshot. The same surface is available through MCP as dhee_shell(command) and through Python:

from dhee import ContextWorkspace

result = ContextWorkspace(repo=".").execute("provision 'fix parser bug'")
print(result.stdout)

External systems such as Slack, Gmail, and Notion are future context sources under /sources, not generic remote action backends. They can sync and search evidence into Dhee artifacts, learnings, and handoffs without making the core install depend on SaaS SDKs.

/learnings   candidates, promoted, rejected, archived
/state       current compiled state, state card, decisions, epoch history
/context     debt, status, checkpoints, rollover evidence
/handoff     latest repo/session continuity
/router/ptr  raw pointer lookup when explicitly requested
/artifacts   host-parsed files and chunks
/repo        .dhee/context decisions and conventions
/agents      Hermes, Claude Code, Codex views
/shared      inbox, broadcasts, shared task results
/sources     optional future Slack/Gmail/Notion context mounts

Quick Start

One command. No venv. No config. No pasting into settings.json.

curl -fsSL https://raw.githubusercontent.com/Sankhya-AI/Dhee/main/install.sh | sh

The installer creates ~/.dhee/, installs the dhee package, and auto-wires Claude Code, Codex, and Hermes when detected. Open your agent in any project — cognition is on.

Other install paths

# Via pip
pip install dhee
dhee install                      # configure supported agent harnesses

# From source
git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee && ./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
dhee install

After install, Dhee auto-ingests project docs (CLAUDE.md, AGENTS.md, SKILL.md, etc.) on the first session. Run dhee ingest any time to re-chunk.

dhee install                  # configure local agent harnesses
dhee hermes status            # see whether Hermes is detected and Dhee-backed
dhee hermes sync --dry-run    # preview Hermes memories/skills before import
dhee learn search --include-candidates  # inspect candidates and promotions
dhee link /path/to/repo       # share context with teammates through this repo
dhee context refresh          # refresh repo context after pull/checkout
dhee handoff                  # compact continuity for current repo/session
dhee key set openai           # store a provider key locally (encrypted)
dhee router report            # token-savings stats + replay projection
dhee router tune              # re-tune retrieval policy from usage

Repo-Shared Context — git is the sync layer

Most "team memory" tools need a server. Dhee uses the one your team already trusts: git.

dhee link /path/to/repo

Dhee creates a tracked folder inside your repo:

<repo>/.dhee/
  config.json
  context/manifest.json
  context/entries.jsonl

Commit it. Teammates who pull the repo and have Dhee installed get the same shared context — decisions, conventions, what-not-to-do — surfaced into their agent automatically.

Shared context is append-only and git-friendly. If two developers edit overlapping context concurrently, Dhee keeps both versions and reports a conflict instead of silently dropping one developer's work. The installed pre-push hook blocks unresolved conflicts from leaving the laptop:

dhee context check --repo /path/to/repo

No hosted service. No org account. Your repo is the team brain.

Benchmarks

#1 on LongMemEval recall. R@1 94.8%, R@5 99.4%, R@10 99.8% — full 500 questions, no held-out split, no cherry-picking.

System	R@1	R@3	R@5	R@10
Dhee	94.8%	99.0%	99.4%	99.8%
MemPalace (raw)	—	—	96.6%	—
MemPalace (hybrid v4, held-out 450q)	—	—	98.4%	—
agentmemory	—	—	95.2%	98.6%

Stack: NVIDIA llama-nemotron-embed-vl-1b-v2 embedder + llama-3.2-nv-rerankqa-1b-v2 reranker, top-k 10.

Proof is in-tree, not screenshots. Exact command, metrics, and per-question output live under benchmarks/longmemeval/. Recompute R@k yourself — any mismatch is a bug you can open.

How It Works

                  ┌──────────────────────────────┐
                  │   Your fat context             │
                  │   CLAUDE.md · AGENTS.md ·      │
                  │   SKILL.md · prompts · docs ·  │
                  │   sessions · tool output       │
                  └──────────────┬─────────────────┘
                                 │ ingest once
                                 ▼
       ┌────────────────────────────────────────────────────┐
       │             Dhee · local SQLite brain               │
       │                                                     │
       │  doc chunks · short-term · long-term · insights ·   │
       │  beliefs · policies · intentions · episodes · edits │
       └─────────────────────┬───────────────────────────────┘
                             │
              ┌──────────────┴───────────────┐
              ▼                              ▼
       Session start                    Each user prompt
       (full assembly)                  (matching slice only)
              │                              │
              └──────────────┬───────────────┘
                             ▼
              ┌────────────────────────────┐
              │  Token-budgeted XML         │
              │  <dhee v="1">               │
              │    <doc src="CLAUDE.md"…/>  │
              │    <i>What worked last…</i> │
              │  </dhee>                    │
              └────────────────────────────┘
                             │
                  Model sees only what it
                  needs, when it needs it.

On the tool-use side, the router digests raw output at source — never letting raw Read, Bash, or subagent results into context unless the model asks.

The four-operation API

Every interface — hooks, MCP, Python, CLI — exposes the same four operations.

from dhee import Dhee
d = Dhee()
d.remember("User prefers FastAPI over Flask")
d.recall("what framework does this project use?")
d.context("fixing the auth bug")
d.checkpoint("Fixed auth bug", what_worked="git blame first", outcome_score=1.0)

Operation	LLM calls	Cost
`remember` / `recall` / `context`	0	~$0.0002
`checkpoint`	1 per ~10 memories	~$0.001
Typical 20-turn Opus session	~1	~$0.004

Dhee overhead: $0.004/session. Token savings on the same 20-turn session: **$0.50+**. >100× ROI.

The router — digest at source

Four MCP tools replace Read / Bash / Agent on heavy calls:

dhee_read(file_path, offset?, limit?, query?, task_intent?) — symbols, focus slices, head/tail, kind, token estimate + pointer. When no query is passed, Dhee infers one from compiled state.
dhee_bash(command, preview_only?) — preflight risk, output class, stderr/stdout landmarks, and command-specific reducers for git diffs, pytest/build failures, grep, listings, and generic logs.
dhee_agent(text) — file refs, headings, bullets, error signals from any subagent return.
dhee_expand_result(ptr, range?, symbol?, reason?, expected?) — only called when the digest genuinely isn't enough; expansion reasons feed router tuning.

A 10 MB git log --oneline -50000 becomes a ~200-token digest. This is where the serious savings live.

Learns what to show

Most memory layers are static: you write rules, they retrieve. Dhee watches what happens and tunes itself.

Intent classification. Every Read/Bash/Agent call is bucketed (source, test, config, doc, data, build). Reads also inherit the live compiled-state task intent, so a debug session gets failure landmarks without the agent remembering to pass a query.
Stable duplicate suppression. Admission hashes the underlying evidence, not the fresh pointer string, so unchanged repeated reads stop adding debt.
Expansion ledger. Every dhee_expand_result(ptr) is logged with (tool, intent, depth, slice mode, reason, expected signal).
Policy tuning. dhee router tune reads the ledger and atomically rewrites ~/.dhee/router_policy.json — deeper for what gets expanded, shallower for what doesn't.

Frontend-heavy teams get deeper JS/TS digests. Data teams get richer CSV/JSONL summaries. You don't pick — Dhee picks, based on what you actually expand.

vs alternatives

	Dhee	CLAUDE.md	Mem0	Letta	MemPalace	agentmemory
Tokens / turn	~300	2,000+	varies	~1K+	varies	~1,900
LongMemEval R@5	99.4%	—	—	—	96.6%	95.2%
Adapts from expansions	Yes	No	No	No	No	No
Hermes → Claude/Codex learning exchange	Yes	No	No	No	No	No
Auto-digest tool output	Yes	No	No	No	No	No
Git-shared team context	Yes	Manual	No	No	No	No
Works across MCP agents	Yes	No	Partial	No	Yes	Yes
External DB required	No (SQLite)	No	Qdrant/pgvector	Postgres+vector	No	No
License	MIT	—	Apache-2	Apache-2	MIT	MIT

Dhee is not trying to be the agent, the IDE, or the memory SaaS. It is the context manager those systems need underneath them: smaller prompts, reproducible recall, adaptive retrieval, git-shared team context, and auditable knowledge reuse in one local-first package.

Integrations

Hermes Agent — native MemoryProvider

dhee install                  # detects Hermes and enables Dhee when present
dhee hermes status
dhee hermes sync --dry-run

Dhee installs as the Hermes memory provider, mirrors Hermes memory writes, imports local Hermes memory files, and checkpoints Hermes sessions into Dhee learning candidates. Curated MEMORY.md / USER.md imports can be promoted on install; SOUL.md, session traces, and agent-created skills stay gated. Promoted playbooks flow back into Hermes through the provider and out to Claude Code/Codex through Dhee context.

Claude Code — native hooks

pip install dhee && dhee install

Six lifecycle hooks fire at the right moments. Claude Code gets Dhee handoff, shared tasks, inbox broadcasts, learned playbooks, and router enforcement for heavy Read/Bash/Grep calls.

Codex — closest native surface

pip install dhee && dhee install --harness codex
dhee harness status --harness codex

Dhee writes ~/.codex/config.toml, manages a global ~/.codex/AGENTS.md block, advertises context-first MCP instructions, and tails Codex session logs on Dhee calls. Codex does not currently expose Claude-style pre-tool hooks, so this is the strongest truthful native integration available.

MCP server — Cursor, Gemini CLI, Cline, Goose, anything MCP

{
  "mcpServers": {
    "dhee": { "command": "dhee-mcp" }
  }
}

Python SDK / CLI / Docker

dhee remember "User prefers Python"
dhee recall  "programming language"
dhee ingest CLAUDE.md AGENTS.md
dhee checkpoint "Fixed auth" --what-worked "checked logs"

Provider options

pip install dhee[openai,mcp]    # cheapest embeddings
pip install dhee[nvidia,mcp]    # current SOTA stack
pip install dhee[gemini,mcp]
pip install dhee[ollama,mcp]    # local, no API costs

Public vs Enterprise

	Public Dhee (this repo, MIT)	Dhee Enterprise (private)
Local memory + router	✅	✅
Self-tuning retrieval	✅	✅
Hermes → Claude Code/Codex learning exchange	✅	✅
Git-shared repo context	✅	✅
Claude Code / Codex / MCP	✅	✅
Org / team management	—	✅
Repo Brain code-intelligence	—	✅
Owner dashboard, billing, licensing	—	✅
Sentry-derived security telemetry	—	✅

Public Dhee is the local collaboration layer — lightweight, trustworthy, and complete on its own. The commercial layer is closed-source and lives in Sankhya-AI/dhee-enterprise.

FAQ

What problem does Dhee solve? Large agent projects accumulate a fat CLAUDE.md, AGENTS.md, skills library, and tool output that get re-injected every turn. Dhee chunks, indexes, and decays that knowledge, and digests fat tool output at the source — so only the relevant ~300 tokens reach the model.

How is Dhee different from Mem0, Letta, MemPalace, agentmemory? Dhee is built around four pieces most tools treat separately: reproducible LongMemEval results, a self-tuning retrieval/router policy, source-side digests for heavy Read/Bash/subagent output, and git-shared team context instead of a server.

Does Dhee work with Claude Code, Cursor, Codex, Gemini CLI, Aider? Yes. Native Claude Code hooks, closest-native Codex config/AGENTS/session-stream sync, a Hermes MemoryProvider, an MCP server for every other host, plus a Python SDK and CLI. One install, every agent.

Does Hermes make Claude Code and Codex smarter? Yes, through Dhee's learning exchange after promotion. Dhee can install as Hermes' memory provider, import Hermes memory/session/skill artifacts, and expose promoted learnings to Claude Code, Codex, and any MCP client as Learned Playbooks. Claude/Codex do not have to run Hermes to benefit.

Does Claude Code or Codex evolve Hermes back? Yes, after promotion. Claude Code hooks, Codex session-stream sync, MCP memory tools, and learning submissions create Dhee learning candidates. Promoted personal/repo/workspace playbooks are retrieved by Hermes through the Dhee provider.

How does the team-context sharing actually work? dhee link /path/to/repo writes a .dhee/ directory inside your repo. Commit it. Teammates pull, install Dhee, and their agent surfaces the same shared decisions and conventions. Append-only with conflict detection — no overwrites, no server, no account.

Is Dhee production-ready? What storage? SQLite by default. No Postgres, no Qdrant, no pgvector, no infra. The regression suite and reproducible benchmarks live in-tree. MIT, works offline with Ollama or online with OpenAI / NVIDIA NIM / Gemini.

Where are the benchmarks and can I reproduce them? benchmarks/longmemeval/ — full command, per-question JSONL, metrics.json. Clone, run, recompute R@k. Any mismatch is an issue you can open.

Contributing

git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee && ./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
pytest

For the same full-suite path CI expects, including the local Rust acceleration extension and async test plugin:

./scripts/verify_full_suite.sh

Your fat skills stay fat. Your token bill stays thin. Promoted learnings travel with every agent.

GitHub · PyPI · Issues · Sankhya AI

MIT License — built by Sankhya AI Labs.

_{Topics: ai-agents · agent-memory · llm-memory · developer-brain · claude-code · claude-code-hooks · claudemd · agentsmd · mcp · mcp-server · model-context-protocol · context-router · context-engineering · context-compression · token-optimization · llm-tools · vector-memory · sqlite · longmemeval · retrieval-augmented-generation · rag · mem0-alternative · letta-alternative · mempalace-alternative · cursor · codex · gemini-cli · aider · cline · goose}

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
.github/workflows		.github/workflows
benchmarks/longmemeval		benchmarks/longmemeval
dhee-accel		dhee-accel
dhee		dhee
dhee_shared		dhee_shared
docs		docs
engram-bus		engram-bus
engram		engram
plugins/engram-memory		plugins/engram-memory
scripts		scripts
sdks/js		sdks/js
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SPEC.md		SPEC.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
lme_storage_calc.py		lme_storage_calc.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
simulate_usecases.py		simulate_usecases.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dhee — the context manager for AI coding agents

Dhee decides what your coding agent should see, remember, and forget each turn, so it stays cheap, reliable, and auditable.

What is Dhee?

Who it's for

Current State — keep the agent oriented

Team Knowledge — reuse what agents learn

Reality check

File Interface — inspect agent context like local files

Quick Start

Repo-Shared Context — git is the sync layer

Benchmarks

How It Works

The four-operation API

The router — digest at source

Learns what to show

vs alternatives

Integrations

Hermes Agent — native MemoryProvider

Claude Code — native hooks

Codex — closest native surface

MCP server — Cursor, Gemini CLI, Cline, Goose, anything MCP

Python SDK / CLI / Docker

Provider options

Public vs Enterprise

FAQ

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dhee — the context manager for AI coding agents

Dhee decides what your coding agent should see, remember, and forget each turn, so it stays cheap, reliable, and auditable.

What is Dhee?

Who it's for

Current State — keep the agent oriented

Team Knowledge — reuse what agents learn

Reality check

File Interface — inspect agent context like local files

Quick Start

Repo-Shared Context — git is the sync layer

Benchmarks

How It Works

The four-operation API

The router — digest at source

Learns what to show

vs alternatives

Integrations

Hermes Agent — native MemoryProvider

Claude Code — native hooks

Codex — closest native surface

MCP server — Cursor, Gemini CLI, Cline, Goose, anything MCP

Python SDK / CLI / Docker

Provider options

Public vs Enterprise

FAQ

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages