Keep custom only
Leave reflect/recall as the source of truth. No Hindsight adoption except maybe manual research.
Decision support for whether Lambda should bin the custom reflect/recall plugin and standardize on Vectorize Hindsight.
Do not bin the custom reflect/recall system wholesale yet. Adopt Hindsight as a shadow/hybrid memory substrate first. Keep the custom self-improvement loop, correction ledger, fleet learnings, and human-editable observability layer.
Best target state: Hindsight for temporal/entity/graph recall; custom Lambda reflect pipeline for policy, correction capture, skill promotion, and ops-grade auditability. If the pilot proves token cost and correction visibility, retire the custom retrieval backend. Not the self-improvement control plane.
This is not “custom memory vs packaged memory.” It is control-plane vs substrate.
Custom reflect/recall is a Lambda-specific learning control plane. Hindsight is a general memory engine with stronger retrieval, temporal modeling, entity graph, observations, and API/UI surface.
Weights tuned to your stated priorities. Scores are 1–5. Weighted total is /5.0.
Current system is file-first and hook-driven. It treats memory as learning artifacts, not a general conversation database.
recall.py wraps ~/.learnings/cli/learnings for GraphRAG/vector search.~/.reflect/pending_reflections.jsonl./reflect.Hindsight is a memory engine. It turns content into structured facts, entities, graph links, temporal metadata, observations, and reflect responses with citations.
retain() extracts structured facts, entity resolution, graph links, causal links, event time and learned time.recall() runs semantic, keyword, graph, and temporal search, fuses results, reranks, and stops at a token budget.reflect() runs an agentic loop over mental models, observations, raw facts, and expand tools.auto_retain, auto_recall, retain_every_n_turns, retain_async, recall_max_tokens, recall_budget, tags, bank templates.The decision is not binary. Four realistic paths.
| Feature | Current reflect/recall | Hindsight | Winner | Notes |
|---|---|---|---|---|
| Primary purpose | Fleet self-improvement and learning retrieval | General agent memory engine | Depends | Different jobs. Current system encodes Lambda process. Hindsight encodes memory mechanics. |
| Storage substrate | Markdown/YAML/JSONL, ~/.learnings, ~/.reflect, graph cache, QMD docs | Memory banks on PostgreSQL/Oracle or embedded/local/cloud | Hindsight | Hindsight has a coherent database model. Custom is inspectable but fragmented. |
| Automatic write path | PreCompact queues transcripts; agent later runs /reflect; discoveries/corrections append explicitly | Hermes provider can auto-retain every N turns; async retain; document/session metadata | Hindsight | Hindsight is stronger for ordinary memory ingestion. Current is safer for intentional learning capture. |
| Fact extraction | LLM-driven /reflect creates learnings/patterns by instruction; no universal fact schema | LLM extracts structured facts, speaker perspective, entities, time, causal links | Hindsight | Hindsight wins factual memory. Custom wins policy-specific reflection. |
| Entity resolution | Mostly whatever GraphRAG/learnings index infers; not exposed as a first-class correction surface | Explicit entity recognition/resolution and graph links | Hindsight | Major Hindsight advantage. |
| Temporal model | Recency via archive timestamp half-life; session/queue timestamps; branch/project query context | Tracks event time and learned time; temporal recall parses date windows and spreads across period | Hindsight | This maps directly to your “temporal matters” requirement. |
| Search modes | GraphRAG via learnings CLI + QMD BM25; RRF fusion; confidence/recency/tag rerank | Semantic + keyword + graph + temporal; RRF fusion; cross-encoder rerank; boosts for recency/proof/time | Hindsight | Current is credible. Hindsight is deeper and productized. |
| Token budgeting | Hard character caps: SessionStart top 3, 1500 chars; explicit recall default 2000 chars | max_tokens budget; budget low/mid/high; recall default 4096 tokens in Hermes provider | Custom by default | Hindsight has better knobs. Current defaults are cheaper. Misconfigured Hindsight can tax every turn. |
| Read-path LLM use | recall retrieval itself no LLM; /reflect uses main agent when asked/queued | recall() no LLM; reflect() uses LLM loop; retain/consolidation use LLM | Tie | For retrieval only, both can be cheap. Hindsight write path costs more. |
| Self-improvement workflow | Corrections → discoveries → patterns → skills; fleet rules and promotion pipeline exist | Observations evolve with evidence; directives and mental models exist, but not Lambda correction/skill workflow | Custom | This is the strongest reason not to bin custom. |
| Observation/consolidation | Patterns are explicit, but promotion is agent/process-driven | Automatic observations with proof count, freshness, evidence quotes, contradiction handling | Hindsight | Hindsight’s observations are closer to a memory substrate; custom patterns are closer to operational policy. |
| Correction/deletion semantics | Edit files directly; corrections logged in markdown; archive queues manually | Delete memory/document/bank; derived observations invalidated and re-consolidated; clear observations endpoint | Hindsight | Custom is more transparent; Hindsight is more internally consistent after deletion. |
| Observability | Plain files, JSONL logs, recall cache/log, forensics breadcrumbs; easy grep/diff | Control plane/UI, API, traces/tool calls in reflect, usage metrics, Prometheus metrics | Tie | Different observability. Custom is Unix-visible. Hindsight is product-visible. |
| Citations/evidence | Learning snippets and markdown sources; not always proof-counted | Reflect returns based_on, citations, trace, usage; observations carry source memories and quotes | Hindsight | Important if you want memory to defend itself. |
| Human editability | Excellent. Edit markdown/JSONL. Git diffable. | Good via API/UI, but database-backed and less “open a file and patch.” | Custom | For emergency correction, files are hard to beat. |
| Multi-agent isolation | Depends on paths/profile convention and fleet discipline | Bank IDs, tags, bank templates: profile/workspace/platform/user/session | Hindsight | Hindsight provider code already supports dynamic bank ID templates and tags. |
| Operational maturity | Local custom scripts; fragile but owned | Public repo, client SDKs, local/cloud modes, metrics; still new and dependency-heavy | Hindsight | Hindsight lowers maintenance, but creates vendor/project dependency. |
| Local/offline | Works if local learnings/QMD/GraphRAG stack exists | Local embedded/external supported; can use local LLM/embeddings/reranker | Tie | Both need dependency hygiene. |
| Failure mode | Silent no-op by design; empty context if CLI/hook fails | Provider logs failures; retain queue can drop on shutdown timeout; daemon/client failure modes | Depends | Current fails quiet. Hindsight fails richer but has more moving parts. |
Use Hindsight in tools-only or low-injection mode during pilot:
memory_mode=tools or auto_recall=false for normal turns.recall_budget=low, recall_max_tokens=1024–2048, recall_max_input_chars=400–800.retain_async=true, retain_every_n_turns > 1 for noisy channels.hindsight_reflect for explicit synthesis, not every prompt.Do not enable the naive “auto retain every turn + auto recall 4096 tokens + reflect prefetch” setup across Discord.
That creates two costs: write-side LLM extraction/consolidation, and read-side context inflation.
| Path | Retrieval LLM? | Write LLM? | Context injected | Cost risk |
|---|---|---|---|---|
| Current SessionStart recall | No | Only later /reflect | ~1500 chars cap | Low |
| Current explicit /reflect | Main agent reasoning | Learning creation | Intentional | Medium |
| Hindsight recall tool | No | Retain/consolidation elsewhere | Caller controls max_tokens | Low-medium |
| Hindsight auto context | No | Depends auto_retain | Default 4096 tokens unless tuned | High if untuned |
| Hindsight reflect | Yes, agentic loop | May depend on retained data | Response + evidence | Medium-high |
tools mode first.agent:motoko, clan:lambda, type:correction, type:incident, type:pattern.Bin custom retrieval only after Hindsight proves:
| Test | Pass condition | Why |
|---|---|---|
| Temporal query | “What changed after the heartbeat cutover?” returns ordered facts with event/learned time separation. | Validates Stevie’s temporal priority. |
| Correction query | Inject wrong fact, correct it, verify old observation is stale/refined or removed. | Validates memory correction, not just recall. |
| Token budget | Normal Discord turns inject ≤1500 tokens p95; explicit research can request more. | Avoids permanent memory tax. |
| Self-improvement loop | A correction still lands in corrections/patterns/skill pipeline, with Hindsight as evidence substrate only. | Preserves Lambda behavior. |
| Observability | Operator can inspect source memory, derived observation, proof count, trace, and usage within 60 seconds. | 2am test. |
~/.claude/skills/recall/scripts/recall.py — GraphRAG wrapper, QMD BM25 booster, RRF fusion, confidence/recency/tag rerank, cache and logs.~/.claude/skills/recall/hooks/session_start_recall.py — project/branch/recent-commit query, top-3 injection, 1500 char cap.~/.claude/skills/reflect/hooks/precompact_reflect.py and sessionstart_drain_reflections.py — transcript queue, next-session LLM processing.d/git/hermes-agent/plugins/memory/hindsight/__init__.py — cloud/local modes, auto retain/recall, tags, bank templates, tools/context/hybrid mode, prefetch, retain queue, session switch handling.retain.md, retrieval.md, observations.mdx, reflect.mdx, performance.md, configuration.md from vectorize-io/hindsight.