Options Paper · agent memory substrate

Hindsight vs OpenViking for consolidated Lambda fleet memory

Date2026-06-08ScopeFreeman · Motoko · TankStatussourced recommendationDecision input

Shortest answer

For a shared memory substrate that multiple agents on multiple machines can point at, OpenViking is operationally cleaner and likely cheaper on tokens. Hindsight is stronger if the primary job is high-fidelity temporal memory and learned mental models.

Recommendation: pilot OpenViking first for consolidated Lambda context DB. Keep Hindsight as benchmark/control. Do not keep both as active recall paths long-term. Keep Lambda correction/learning hooks as governance, not second memory.

Decision rule

If consolidated fleet context, browsing, auditability, Codex-plan model reuse, and token control dominate → OpenViking.
If temporal reasoning, entity/relationship memory, and autonomous learning quality dominate → Hindsight.
If AGPLv3 network-service obligations are unacceptable → avoid OpenViking or isolate policy first.

What we are deciding

Stevie wants one consolidated memory surface for Freeman, Motoko, and Tank across machines. Long-term personality memory is secondary. Temporal context, self-improvement, observability/correction, and token usage are first-class constraints.

Self-improvement fit

Corrections become behavior/patterns/skills.

weight 0.25

Temporal fidelity

Can answer what changed, when, and why.

weight 0.22

Observability/correction

Inspect, edit, delete, trace retrieval.

weight 0.22

Token efficiency

Read-side injection + write-side LLM cost.

weight 0.20

Maintenance drag

Runtime, migrations, model keys, failure modes.

weight 0.08

Portability

Freeman/Motoko/Tank, local/remote/shared.

weight 0.03

Architecture: Hindsight

Data model

Memory banks contain world facts, experiences, mental models, entities, relationships, time series, sparse/dense search indexes, and metadata.

Retrieval model

Recall runs semantic vector, BM25 keyword, graph links, and temporal filtering in parallel; merges with reciprocal-rank fusion and cross-encoder reranking, then trims to token limit.

Architecture: OpenViking

Data model

Unified filesystem-style context DB. Memories, resources, and skills live behind viking:// URIs with L0 abstracts, L1 overviews, L2 full content, relations, resources, sessions, and per-tenant user/agent identity.

Retrieval model

find() gives lower-latency direct semantic retrieval; search() does LLM intent analysis over session summary + recent messages, produces typed queries, searches recursively through hierarchy, then reranks when configured.

Feature list comparison

Matrix combines local Hermes provider code plus upstream docs. “Hermes fit” means what current Hermes provider exposes without writing new integration glue.

Capability	Hindsight	OpenViking	Fleet implication
Hermes provider	First-class plugin: `hindsight_retain`, `hindsight_recall`, `hindsight_reflect`; context/tools/hybrid modes.	First-class plugin: `viking_search`, `viking_read`, `viking_browse`, `viking_remember`, `viking_add_resource`.	Both viable.
Deployment modes	Cloud, local embedded, local external self-host. Docker exposes API/UI; external Postgres supported.	Self-host server on port 1933; Docker/systemd; Studio served at `/studio`; cloud/service route exists via Volcengine ecosystem.	Both can be shared by Freeman/Motoko/Tank.
Storage	Postgres + vector/graph/time indexes; local embedded bundled DB; external DB via env.	AGFS content storage + vector index; localfs/S3/memory backends; multi-write backups; viking:// URI surface.	OpenViking cleaner for “context DB” and file/resource browsing.
Memory unit	Bank-scoped retained facts/experiences/mental models with metadata/tags.	Filesystem nodes, sessions, resources, skills, memories under viking:// roots.	Hindsight more memory-native; OpenViking broader context-native.
Temporal memory	Explicit temporal extraction, time series, temporal retrieval, event chronology.	Sessions/events are stored and compressed; docs emphasize context hierarchy more than temporal reasoning.	Hindsight wins if “when/what changed” is top priority.
Self-improvement	Reflect builds mental models and insights; retain mission can steer extraction.	Session commit extracts profile/preferences/entities/events/cases/patterns; resources/skills are first-class context.	Hindsight stronger learned-model semantics; OpenViking stronger skill/resource substrate.
Correction workflow	Can retain tagged corrections; UI/API can inspect/edit memories depending deployment. Hermes tools can recall/reflect.	viking:// browse/read/edit-ish FS operations, Studio/TUI, retrieval trajectory, user/agent separation.	OpenViking likely easier for “show me what memory says and fix it.”
Observability	Hindsight UI + API; daemon logs; bank-level visibility.	Web Studio, `ov tui`, request logs, observer endpoints, telemetry, Prometheus `/metrics`, stats endpoints.	OpenViking wins ops/debug surface.
Retrieval strategies	Semantic + BM25 + graph + temporal + RRF + cross-encoder rerank.	Hierarchical semantic retrieval, intent analysis, typed queries, directory-recursive search, optional rerank.	Hindsight better pure memory retrieval; OpenViking better structured context navigation.
Token controls	Recall budget low/mid/high; max tokens; max input chars; context/tools/hybrid; retain_every_n_turns.	L0/L1/L2 tiers: abstract ~100, overview ~1–2k, full on demand; Hermes prefetch currently top_k=5 abstracts.	OpenViking wins default token shape.
Write-side model cost	Retain uses LLM extraction; reflect uses LLM synthesis. Needs Hindsight LLM API config separate from Hermes.	Needs VLM + embeddings; session commit/extraction uses model. Supports Codex OAuth provider in upstream docs.	OpenViking may fit current Codex-plan economics better.
Codex plan compatibility	No direct Codex OAuth support found in Hermes Hindsight provider; needs OpenAI-compatible endpoint/key.	Docs list `openai-codex` provider via `openviking-server init` and its own auth state.	Big OpenViking advantage for cost/control if it works in your setup.
Multi-agent / multi-machine	Banks can be shared or isolated by template/profile/user/session; one HTTP API endpoint.	Explicit account/user/agent headers; shared resources per account, isolated user memories/sessions.	OpenViking fits Freeman/Motoko/Tank topology better.
Resource ingestion	Memory API primarily retain/recall/reflect; can store content but not resource-KB-first in Hermes provider.	URLs/docs/code/PDF/media/resource ingestion; resources and skills are first-class.	OpenViking wins for fleet docs/runbooks/repo context.
Tools schema tax	Can run `memory_mode=context` to avoid tools entirely.	Hermes OpenViking provider always exposes five tools in current code.	Hindsight can be lower schema-tax if context-only.
Auto recall injection	Auto prefetch per user turn unless tools mode; configurable max tokens.	Background prefetch per user turn; injects concise abstracts if found.	Both avoid every-toolcall recall by default.
Deletion/edit model	Memory UI/API expected; exact delete/edit path should be verified during pilot.	Filesystem operations include rm/mv/read/tree/stat; visible URI path makes correction tangible.	OpenViking easier mental model.
License	MIT.	AGPLv3 upstream repo license.	OpenViking needs license review before modified/network production use. Not optional.
Maturity signals	Production claims, LongMemEval benchmark claims, Docker/Helm/docs.	Large active repo, docs, metrics, Studio/TUI, but alpha PyPI classifier and many open issues.	Both need pilot, not blind migration.

Weighted score

Hindsight

Best memory fidelity and temporal learning. More model/API cost surface.

Self-improve4.5

Temporal4.8

Observable3.7

Token efficient3.2

Low drag2.8

Portable4.2

weighted4.09

OpenViking

Best consolidated context DB and ops surface. Weaker dedicated temporal-memory semantics; license risk.

Self-improve4.1

Temporal3.4

Observable4.8

Token efficient4.6

Low drag3.1

Portable4.9

weighted4.22

Scores reflect Stevie’s stated weights, not generic memory-benchmark weights. Change weight on temporal fidelity from 0.22 to 0.35 and Hindsight becomes leader. Increase token/observability/portability and OpenViking extends lead.

Token economics

Hindsight cost shape

Read side: auto recall once per user turn in context/hybrid; not every tool call. Default max 4096 tokens is too high for Lambda.
Write side: auto retain every turn by default; LLM extracts facts/entities/relationships/time. Reflect is extra LLM synthesis.
Cheap config: memory_mode=context, recall_prefetch_method=recall, recall_budget=low, recall_max_tokens=400–800, retain_every_n_turns>1 if acceptable.

OpenViking cost shape

Read side: Hermes provider prefetch returns top abstracts, usually compact. Agent can read overview/full only when needed.
Write side: session commit/extraction and resource summarization need VLM/embedding. AST mode can avoid LLM for long code skeletons.
Cheap config: use Codex/OAuth or cheap VLM if validated, prefer L0 abstracts in auto context, require explicit read for L1/L2.

Observability and correction

OpenViking has better correction ergonomics

It exposes context as navigable viking:// filesystem. Studio, TUI, observer endpoints, request logs, telemetry, and Prometheus metrics give Motoko-grade surfaces. That matters because bad memory is production state, not vibes.

Web Studioov tui/metricsretrieval trajectoryviking:// browse/read

Hindsight has better memory semantics

It is more explicitly designed around memory correction by bank, entity, temporal data, experiences, and mental models. But correction UX/API must be verified in pilot for your exact self-host mode.

banksmetadata/tagsentitiestime seriesmental models

Fleet rollout shape

Need	Hindsight rollout	OpenViking rollout
Shared substrate	One Hindsight API + shared bank or per-agent banks with shared tags.	One OpenViking server + `account=lambda`; `user=stevie` or per-human; `agent=freeman/motoko/tank`.
Tank from other machine	Point Hermes Hindsight config at same API URL, same bank template/API key.	Point `OPENVIKING_ENDPOINT` at same server, set account/user/agent headers.
Replace Lambda bank_lookup	Disable `bank_lookup.py`; use Hindsight context-only recall.	Disable `bank_lookup.py`; rely on OpenViking prefetch + tools.
Keep self-improvement hooks	Keep correction_detector/learning_sync as governance and git-backed hard rules.	Same. Hooks should write visible patterns/skills, not duplicate recall.
Migration	Retain existing MEMORY, corrections, patterns, session summaries into banks with tags.	Import existing memory docs/runbooks/resources under viking://user and viking://resources; commit sessions for extraction.

Target topology: OpenViking on Cloudflare

Backend centralizes shared Lambda memory behind Cloudflare. OpenViking client stays local to each Hermes instance, so Freeman, Motoko, and Tank keep local tool/runtime behavior while sharing one governed context substrate.

Client placementRun OpenViking/Hermes provider locally per instance. Set account/user/agent identity headers; no shared local memory DB.

Backend placementUse Cloudflare Containers for stateful OpenViking server if native server semantics are needed. Put durable AGFS/resource state in R2, metadata in D1, embeddings in Vectorize.

Ops gateProtect API and Studio with Cloudflare Access/service tokens. Expose metrics/logs to Motoko; failure should degrade to no-memory, not broken agent loop.

Recommendation

Pilot OpenViking first for Lambda fleet consolidation. Reason: your current pain is not “chatbot remembers user favorite color.” It is shared operational context across Freeman, Motoko, and Tank, with low token burn, visible correction, and machine-to-machine portability. OpenViking’s context database shape fits that better.

Do not declare Hindsight dead. Use it as benchmark/control for temporal questions. If OpenViking fails “what changed when and what did we learn?” tests, switch to Hindsight despite the extra LLM cost. The crowbar works until it doesn’t.

Hard blocker before adoption: AGPLv3 review for OpenViking. If license posture is unacceptable, Hindsight becomes default recommendation.

Pilot acceptance tests

Functional

Freeman, Motoko, Tank all connect to same server from separate machines.
Each agent has isolated identity but shared Lambda resources.
Query “what did Motoko learn about cron/model fallback?” returns right pattern with source path.
Query “what changed after Stevie corrected dual-memory recommendation?” returns correction.
Bad memory can be found, edited/deleted, and absence verified.

Economic / ops

Average injected memory context < 800 tokens per routed user turn.
No auto recall per tool call.
Write-side model calls visible in logs/metrics.
Dashboard or TUI shows retrieval trajectory.
Failure mode degrades to “no memory,” not broken agent loop.

Sources inspected

Hermes local code: plugins/memory/hindsight/README.md, plugins/memory/hindsight/__init__.py, plugins/memory/openviking/README.md, plugins/memory/openviking/__init__.py. Upstream docs: vectorize-io/hindsight README; volcengine/OpenViking README/PyPI; OpenViking architecture, context layers, storage, extraction, retrieval, multi-tenant, metrics, deployment, observability, Hermes integration docs. Secrets redacted/omitted.