Research & Learning · feature summary

How `reflect` captures + recalls across Claude Code and Codex CLI

TL;DR — Two harnesses (Claude Code, Codex CLI) wire the same three hook scripts into different config files (~/.claude/settings.json vs ~/.codex/hooks.json) and share one on-disk knowledge base (~/.reflect/ queue + ~/.learnings/ documents + GraphRAG index). SessionStart fires recall (inject top-3 prior learnings) and the bg-drainer (process any queued transcripts). PreCompact fires precompact_reflect (enqueue the current transcript before the harness throws it away). Because the queue is harness-agnostic, a codex session can enqueue a reflection that a later Claude session drains — and vice versa.

Architecture at a glance

Two harnesses, three hook scripts, one shared knowledge base. Numbered circles on the diagram match the steps in Recall loop and Capture loop below.

Harness / external process Hook script (process) Append-only queue Knowledge base storage Trigger / read flow Recall (read into context) Capture (write to disk)

The recall loop · SessionStart → context

Fires on every session start in both harnesses. Always exits 0 — never blocks startup even when the knowledge base is empty or the reflect CLI is missing.

1 · Hook fires SessionStart

Both Claude and Codex serialize the same JSON envelope onto the hook's stdin — {session_id, transcript_path, cwd, hook_event_name, source, ...}. The recall script is the same in both cases; the only thing that differs is which config file pointed the harness at it.

# the hook script reads from stdin
input = json.load(sys.stdin)
cwd = input.get('cwd')        # project being worked on
source = input.get('source')  # startup | resume | clear

2 · Query context from cwd · branch · git log session_start_recall.py:174

The script doesn't ask the model "what do you need?" — it builds a context query itself from cheap signals: current working directory, current git branch, recent commit messages on the branch. That string becomes the GraphRAG query.

3 · Hybrid vector + graph search; rerank to top-3 GraphRAG (nano-graphrag) + hnswlib

Two signals are blended: dense vector similarity over learning content, and graph proximity via entity sidecars (the .entities.yaml files written alongside each learning). Reranking weights recency, confidence tag, and tag overlap with the query. Output is capped at the top three.

4 · Inject as additionalContext session_start_recall.py:158

The script emits a single JSON object on stdout containing hookSpecificOutput.additionalContext. The harness reads that envelope and silently prepends the learnings to the session's developer-message context — the user never sees the JSON, the model sees the learnings as if they were instructions.

{
  "hookSpecificOutput": {
    "hookEventName": "SessionStart",
    "additionalContext": "## Prior learnings relevant here\n- [lrn-...] Skill X must use ..."
  }
}

The capture loop · PreCompact → queue → drain

Reflection itself is not done synchronously during PreCompact — the compaction can't wait for a 30-second LLM run. Instead, PreCompact just enqueues the transcript path. The actual /reflect runs asynchronously on the next session start (in any harness) and never blocks the user.

1 · PreCompact fires before compaction PreCompact

When the harness is about to compress history (because context is filling up), it serializes {session_id, transcript_path, trigger, ...} to the precompact hook. trigger is either auto (harness decided) or manual (user ran /compact).

2 · Enqueue transcript to ~/.reflect/pending_reflections.jsonl precompact_reflect.py:137

The script appends a single line to ~/.reflect/pending_reflections.jsonl and returns immediately. No LLM call here. The queue file is shared across all harnesses — there's nothing claude-specific about it.

{"transcript_path": "...","session_id":"...","trigger":"auto","queued_at":"..."}

3 · Next SessionStart (any harness) fires the drainer SessionStart

The next session that starts on this machine — Claude or Codex — fires reflect-drain-bg.sh as a detached background process ((nohup ... &) >/dev/null 2>&1) with a 5-second start budget. It's PID-locked so two concurrent drainers can't trample each other, and daily-capped via cost events so a runaway loop can't spend unlimited tokens.

4 · Drain shells out to a headless claude -p run reflect-drain-bg.sh:210

For each queue entry, the drainer spawns claude -p "/reflect <transcript>" with --output-format json, --max-turns 25, and --permission-mode bypassPermissions. The /reflect skill scans the transcript, classifies corrections vs noteworthy patterns, and writes the resulting learning documents to ~/.learnings/documents/.

This subprocess is always claude, even when the queue entry was written by a codex session. Codex is the trigger, Claude is the worker. (Configurable via REFLECT_DRAIN_CLAUDE_BIN in environments where claude isn't on PATH.)

5 · Each successful drain triggers reflect reindex reflect-drain-bg.sh:end-of-main

If at least one entry processed cleanly, the drainer runs reflect reindex (with a 5-minute timeout) so the GraphRAG index picks up the new .md + .entities.yaml files. Without this, learnings are still on disk — they just won't appear in future /recall results until a manual reindex.

Successful entries are removed from the queue; transient failures stay (with a retry counter); permanently broken entries (missing transcript, >3 retries) are moved to ~/.reflect/poison-reflections.jsonl.

How each harness gets wired

The hook scripts are shared. The config plumbing is per-harness.

// .claude-plugin/plugin.json — wired by /plugin install reflect@agents-in-a-box
{
  "hooks": {
    "SessionStart": [{
      "hooks": [
        { "type":"command",
          "command":"uv run ${CLAUDE_PLUGIN_ROOT}/skills/recall/hooks/session_start_recall.py" },
        { "type":"command",
          "command":"(nohup ${CLAUDE_PLUGIN_ROOT}/hooks/reflect-drain-bg.sh &) ...",
          "timeout": 5 }
      ]
    }],
    "PreCompact": [{
      "hooks": [{ "type":"command",
        "command":"uv run ${CLAUDE_PLUGIN_ROOT}/hooks/precompact_reflect.py --auto --verbose" }]
    }]
  }
}

// ~/.claude/settings.json — what the plugin runtime produces
{
  "hooks": {
    "SessionStart": [{
      "matcher": "",
      "hooks": [
        { "type":"command",
          "command":"uv run /Users/<you>/.claude/skills/recall/hooks/session_start_recall.py" },
        { "type":"command",
          "command":"(nohup /Users/<you>/.claude/plugins/.../hooks/reflect-drain-bg.sh ...",
          "timeout": 5 }
      ]
    }],
    "PreCompact": [{ "matcher":"", "hooks":[{
      "command":"uv run .../hooks/precompact_reflect.py --auto --verbose" }] }]
  }
}

# codex has no plugin runtime — the adapter does the wireup itself
python plugins/reflect/adapters/codex/codex_adapter.py install
# or skip the bg drain on codex-only machines without claude on PATH:
python plugins/reflect/adapters/codex/codex_adapter.py install --no-bg-drain

# adapter physically copies plugin content into ~/.codex/skills/
# and merges hook entries into ~/.codex/hooks.json

// ~/.codex/hooks.json — what codex_adapter.py produces
{
  "hooks": {
    "SessionStart": [{
      "matcher": "",
      "hooks": [
        { "type":"command",
          "command":"uv run /Users/<you>/.codex/skills/recall/hooks/session_start_recall.py" },
        { "type":"command",
          "command":"(nohup /Users/<you>/.codex/skills/reflect/hooks/reflect-drain-bg.sh &)...",
          "timeout": 5 }
      ]
    }],
    "PreCompact": [{ "matcher":"", "hooks":[{
      "command":"uv run /Users/<you>/.codex/skills/reflect/hooks/precompact_reflect.py --auto --verbose" }] }]
  }
}

★

The hook scripts themselves don't know which harness fired them — they just read JSON from stdin and write JSON to stdout. That's the design constraint that made cross-tool reflection cheap to add.

The cross-tool case · codex queues, claude drains

Imagine you spend the morning in Codex on a tricky migration, hit context compaction, and quit. In the afternoon you open Claude on the same repo. Here's the timeline:

Morning · Codex compaction at 11:42 codex session

Codex fires PreCompact → precompact_reflect.py appends one line to ~/.reflect/pending_reflections.jsonl with the codex transcript path. No reflection runs. Codex compacts and continues.

Codex session ends · queue still has the entry 11:55

If another SessionStart in the same codex session had fired (e.g. on resume), it would have drained. But the user quit. The entry sits in the queue.

Afternoon · Claude session starts at 14:08 claude session

Claude fires SessionStart. Two hooks run: session_start_recall.py (injects whatever's already in the GraphRAG index — the codex morning's learnings are not there yet because they haven't been processed) and reflect-drain-bg.sh as a detached background process.

Drain picks up the codex transcript · spawns claude -p 14:08 + ~1s

The drain script reads the queue, finds the morning's codex transcript path, and spawns claude -p "/reflect <morning-codex-transcript>" — note this is Claude processing a transcript that was produced by Codex. The transcript format is the same JSONL the harnesses use natively, so /reflect doesn't care which harness wrote it.

Learnings land · reindex updates GraphRAG 14:09

The headless run writes .md + .entities.yaml sidecars under ~/.learnings/documents/, then reflect reindex updates the GraphRAG index. The queue entry is removed.

Next session (Claude OR Codex) recalls them whenever

From this point forward, the next SessionStart in any harness — including the very same Claude session that triggered the drain, on its next start — will see the morning's codex learnings in the top-3 if they match the cwd context.

Gotchas worth knowing

The drainer ALWAYS shells out to claude. Codex is the trigger, not the worker. On a codex-only machine without claude on PATH, the drain logs a warning and exits 0 (no hang), but learnings never get processed. Pass --no-bg-drain to the codex adapter on those machines to skip the hook entirely.
SessionStart never blocks startup. Both hook scripts always exit 0, even on errors — a broken GraphRAG, missing reflect CLI, or unparseable queue all just result in an empty additionalContext.
PreCompact doesn't reflect synchronously. The script only enqueues. Reflection happens on the next session start so it doesn't make the user wait through compaction.
Claude and Codex use the same event-name casing (SessionStart/PreCompact, PascalCase). Copilot CLI uses lowercase sessionStart/preCompact — if you port the adapter, watch the casing.
The queue isn't transactional. If a drain crashes mid-entry, the retry counter (~/.reflect/retry-count.jsonl) survives. Three failed retries on the same transcript and the entry is moved to poison-reflections.jsonl — out of the way but kept for forensics.

FAQ

Why doesn't recall just embed the user's last prompt?: By the time the user submits their prompt, the SessionStart hook has already run. Recall has to use cheap pre-prompt signals (cwd, branch, recent commits) and inject before the conversation begins.
What stops the drain from running every session?: A PID lockfile (~/.reflect/drain.lock) — if another drain is already running, the new one logs and exits. Plus daily caps via REFLECT_DRAIN_DAILY_MAX (default 20). And the queue may simply be empty.
Can I run recall manually?: Yes — invoke /recall as a skill. The same script runs synchronously with a query you provide. Useful when starting a new feature where the cwd-based query misses relevant prior work.
What happens if I install reflect on Claude AND Codex?: That's the supported case. The hook scripts are the same; both harnesses just point at their own copies. The queue and learnings store are shared, so the cross-tool case above just works.
Where does ~/.reflect/ live, and is it portable?: Under $HOME/.reflect/ by default; overridable via REFLECT_STATE_DIR. Contents are JSONL/Markdown/YAML — fully grep-able, version-control friendly if you want, and portable across machines via sync if you sync the dir.