Options Paper · agent memory substrate

Hindsight vs OpenViking for consolidated Lambda fleet memory

Date2026-06-08ScopeFreeman · Motoko · TankStatussourced recommendationDecision input

Shortest answer

For a shared memory substrate that multiple agents on multiple machines can point at, OpenViking is operationally cleaner and likely cheaper on tokens. Hindsight is stronger if the primary job is high-fidelity temporal memory and learned mental models.

Recommendation: pilot OpenViking first for consolidated Lambda context DB. Keep Hindsight as benchmark/control. Do not keep both as active recall paths long-term. Keep Lambda correction/learning hooks as governance, not second memory.

Decision rule

  • If consolidated fleet context, browsing, auditability, Codex-plan model reuse, and token control dominate → OpenViking.
  • If temporal reasoning, entity/relationship memory, and autonomous learning quality dominate → Hindsight.
  • If AGPLv3 network-service obligations are unacceptable → avoid OpenViking or isolate policy first.
01

What we are deciding

Stevie wants one consolidated memory surface for Freeman, Motoko, and Tank across machines. Long-term personality memory is secondary. Temporal context, self-improvement, observability/correction, and token usage are first-class constraints.

Self-improvement fit
Corrections become behavior/patterns/skills.
weight 0.25
Temporal fidelity
Can answer what changed, when, and why.
weight 0.22
Observability/correction
Inspect, edit, delete, trace retrieval.
weight 0.22
Token efficiency
Read-side injection + write-side LLM cost.
weight 0.20
Maintenance drag
Runtime, migrations, model keys, failure modes.
weight 0.08
Portability
Freeman/Motoko/Tank, local/remote/shared.
weight 0.03
02

Architecture: Hindsight

Hermes profilesFreeman / Motoko / TankHermes providerauto_retain / auto_recallcontext · tools · hybridHindsight APIretain · recall · reflectLLM extractionPostgresvectorsgraphtoken knobslow budget · max tokensUI / APIbanks · traces · edit

Data model

Memory banks contain world facts, experiences, mental models, entities, relationships, time series, sparse/dense search indexes, and metadata.

Retrieval model

Recall runs semantic vector, BM25 keyword, graph links, and temporal filtering in parallel; merges with reciprocal-rank fusion and cross-encoder reranking, then trims to token limit.

03

Architecture: OpenViking

Hermes profilesaccount/user/agentHermes providerprefetch top_k=5viking_* toolsOpenViking ServerFS · Search · SessionResource · ObserverAGFSL0/L1/L2vectorsrelationsStudio / TUI / metricstrajectory · logs · /metricstiered contextabstract ~100 · overview ~2k

Data model

Unified filesystem-style context DB. Memories, resources, and skills live behind viking:// URIs with L0 abstracts, L1 overviews, L2 full content, relations, resources, sessions, and per-tenant user/agent identity.

Retrieval model

find() gives lower-latency direct semantic retrieval; search() does LLM intent analysis over session summary + recent messages, produces typed queries, searches recursively through hierarchy, then reranks when configured.

04

Feature list comparison

Matrix combines local Hermes provider code plus upstream docs. “Hermes fit” means what current Hermes provider exposes without writing new integration glue.

CapabilityHindsightOpenVikingFleet implication
Hermes providerFirst-class plugin: hindsight_retain, hindsight_recall, hindsight_reflect; context/tools/hybrid modes.First-class plugin: viking_search, viking_read, viking_browse, viking_remember, viking_add_resource.Both viable.
Deployment modesCloud, local embedded, local external self-host. Docker exposes API/UI; external Postgres supported.Self-host server on port 1933; Docker/systemd; Studio served at /studio; cloud/service route exists via Volcengine ecosystem.Both can be shared by Freeman/Motoko/Tank.
StoragePostgres + vector/graph/time indexes; local embedded bundled DB; external DB via env.AGFS content storage + vector index; localfs/S3/memory backends; multi-write backups; viking:// URI surface.OpenViking cleaner for “context DB” and file/resource browsing.
Memory unitBank-scoped retained facts/experiences/mental models with metadata/tags.Filesystem nodes, sessions, resources, skills, memories under viking:// roots.Hindsight more memory-native; OpenViking broader context-native.
Temporal memoryExplicit temporal extraction, time series, temporal retrieval, event chronology.Sessions/events are stored and compressed; docs emphasize context hierarchy more than temporal reasoning.Hindsight wins if “when/what changed” is top priority.
Self-improvementReflect builds mental models and insights; retain mission can steer extraction.Session commit extracts profile/preferences/entities/events/cases/patterns; resources/skills are first-class context.Hindsight stronger learned-model semantics; OpenViking stronger skill/resource substrate.
Correction workflowCan retain tagged corrections; UI/API can inspect/edit memories depending deployment. Hermes tools can recall/reflect.viking:// browse/read/edit-ish FS operations, Studio/TUI, retrieval trajectory, user/agent separation.OpenViking likely easier for “show me what memory says and fix it.”
ObservabilityHindsight UI + API; daemon logs; bank-level visibility.Web Studio, ov tui, request logs, observer endpoints, telemetry, Prometheus /metrics, stats endpoints.OpenViking wins ops/debug surface.
Retrieval strategiesSemantic + BM25 + graph + temporal + RRF + cross-encoder rerank.Hierarchical semantic retrieval, intent analysis, typed queries, directory-recursive search, optional rerank.Hindsight better pure memory retrieval; OpenViking better structured context navigation.
Token controlsRecall budget low/mid/high; max tokens; max input chars; context/tools/hybrid; retain_every_n_turns.L0/L1/L2 tiers: abstract ~100, overview ~1–2k, full on demand; Hermes prefetch currently top_k=5 abstracts.OpenViking wins default token shape.
Write-side model costRetain uses LLM extraction; reflect uses LLM synthesis. Needs Hindsight LLM API config separate from Hermes.Needs VLM + embeddings; session commit/extraction uses model. Supports Codex OAuth provider in upstream docs.OpenViking may fit current Codex-plan economics better.
Codex plan compatibilityNo direct Codex OAuth support found in Hermes Hindsight provider; needs OpenAI-compatible endpoint/key.Docs list openai-codex provider via openviking-server init and its own auth state.Big OpenViking advantage for cost/control if it works in your setup.
Multi-agent / multi-machineBanks can be shared or isolated by template/profile/user/session; one HTTP API endpoint.Explicit account/user/agent headers; shared resources per account, isolated user memories/sessions.OpenViking fits Freeman/Motoko/Tank topology better.
Resource ingestionMemory API primarily retain/recall/reflect; can store content but not resource-KB-first in Hermes provider.URLs/docs/code/PDF/media/resource ingestion; resources and skills are first-class.OpenViking wins for fleet docs/runbooks/repo context.
Tools schema taxCan run memory_mode=context to avoid tools entirely.Hermes OpenViking provider always exposes five tools in current code.Hindsight can be lower schema-tax if context-only.
Auto recall injectionAuto prefetch per user turn unless tools mode; configurable max tokens.Background prefetch per user turn; injects concise abstracts if found.Both avoid every-toolcall recall by default.
Deletion/edit modelMemory UI/API expected; exact delete/edit path should be verified during pilot.Filesystem operations include rm/mv/read/tree/stat; visible URI path makes correction tangible.OpenViking easier mental model.
LicenseMIT.AGPLv3 upstream repo license.OpenViking needs license review before modified/network production use. Not optional.
Maturity signalsProduction claims, LongMemEval benchmark claims, Docker/Helm/docs.Large active repo, docs, metrics, Studio/TUI, but alpha PyPI classifier and many open issues.Both need pilot, not blind migration.
05

Weighted score

Hindsight

Best memory fidelity and temporal learning. More model/API cost surface.

Self-improve4.5
Temporal4.8
Observable3.7
Token efficient3.2
Low drag2.8
Portable4.2
weighted4.09

OpenViking

Best consolidated context DB and ops surface. Weaker dedicated temporal-memory semantics; license risk.

Self-improve4.1
Temporal3.4
Observable4.8
Token efficient4.6
Low drag3.1
Portable4.9
weighted4.22

Scores reflect Stevie’s stated weights, not generic memory-benchmark weights. Change weight on temporal fidelity from 0.22 to 0.35 and Hindsight becomes leader. Increase token/observability/portability and OpenViking extends lead.

06

Token economics

Hindsight cost shape

  • Read side: auto recall once per user turn in context/hybrid; not every tool call. Default max 4096 tokens is too high for Lambda.
  • Write side: auto retain every turn by default; LLM extracts facts/entities/relationships/time. Reflect is extra LLM synthesis.
  • Cheap config: memory_mode=context, recall_prefetch_method=recall, recall_budget=low, recall_max_tokens=400–800, retain_every_n_turns>1 if acceptable.

OpenViking cost shape

  • Read side: Hermes provider prefetch returns top abstracts, usually compact. Agent can read overview/full only when needed.
  • Write side: session commit/extraction and resource summarization need VLM/embedding. AST mode can avoid LLM for long code skeletons.
  • Cheap config: use Codex/OAuth or cheap VLM if validated, prefer L0 abstracts in auto context, require explicit read for L1/L2.
07

Observability and correction

OpenViking has better correction ergonomics

It exposes context as navigable viking:// filesystem. Studio, TUI, observer endpoints, request logs, telemetry, and Prometheus metrics give Motoko-grade surfaces. That matters because bad memory is production state, not vibes.

Web Studioov tui/metricsretrieval trajectoryviking:// browse/read

Hindsight has better memory semantics

It is more explicitly designed around memory correction by bank, entity, temporal data, experiences, and mental models. But correction UX/API must be verified in pilot for your exact self-host mode.

banksmetadata/tagsentitiestime seriesmental models

08

Fleet rollout shape

NeedHindsight rolloutOpenViking rollout
Shared substrateOne Hindsight API + shared bank or per-agent banks with shared tags.One OpenViking server + account=lambda; user=stevie or per-human; agent=freeman/motoko/tank.
Tank from other machinePoint Hermes Hindsight config at same API URL, same bank template/API key.Point OPENVIKING_ENDPOINT at same server, set account/user/agent headers.
Replace Lambda bank_lookupDisable bank_lookup.py; use Hindsight context-only recall.Disable bank_lookup.py; rely on OpenViking prefetch + tools.
Keep self-improvement hooksKeep correction_detector/learning_sync as governance and git-backed hard rules.Same. Hooks should write visible patterns/skills, not duplicate recall.
MigrationRetain existing MEMORY, corrections, patterns, session summaries into banks with tags.Import existing memory docs/runbooks/resources under viking://user and viking://resources; commit sessions for extraction.
09

Target topology: OpenViking on Cloudflare

Backend centralizes shared Lambda memory behind Cloudflare. OpenViking client stays local to each Hermes instance, so Freeman, Motoko, and Tank keep local tool/runtime behavior while sharing one governed context substrate.

OpenViking on Cloudflare · local clients per agent instance Control plane stays central; Hermes/OpenViking client stays local beside each agent. Memory reads/writes cross one hardened Cloudflare edge. Local agent machines Freeman · Motoko · Tank instances Cloudflare edge TLS, Access, auth, rate limit, routing Cloudflare-hosted OpenViking backend Container/Worker + R2/D1/Vectorize storage plane Freeman Hermes profile=freeman Mac / server A OpenViking local client Motoko Hermes profile=motoko Mac / server B OpenViking local client Tank Hermes profile=tank remote box OpenViking local client DNS + TLS *.memory.domain 443 only Cloudflare Access agent identity gate service tokens / mTLS Worker API Gateway X-agent headers account/user/agent OpenViking Server Cloudflare Container preferred API · Studio · session commit · retrieval R2 AGFS blobs/resources viking://resources D1 metadata + ACLs accounts/users/agents Vectorize embeddings index L0/L1/L2 retrieval Queues / DO async ingest locks commit + summarize Logs / Metrics / Traces memory read/write authenticated API route to backend AGFS writes vector read async ingest Decision rule Run client local to every Hermes instance. Host stateful OpenViking server in Cloudflare Containers; put durable storage in R2/D1/Vectorize. Protect Studio/API with Access. No per-agent memory DBs. request/context memory store/retrieve control/auth async ingest observability
Client placementRun OpenViking/Hermes provider locally per instance. Set account/user/agent identity headers; no shared local memory DB.
Backend placementUse Cloudflare Containers for stateful OpenViking server if native server semantics are needed. Put durable AGFS/resource state in R2, metadata in D1, embeddings in Vectorize.
Ops gateProtect API and Studio with Cloudflare Access/service tokens. Expose metrics/logs to Motoko; failure should degrade to no-memory, not broken agent loop.

Recommendation

Pilot OpenViking first for Lambda fleet consolidation. Reason: your current pain is not “chatbot remembers user favorite color.” It is shared operational context across Freeman, Motoko, and Tank, with low token burn, visible correction, and machine-to-machine portability. OpenViking’s context database shape fits that better.

Do not declare Hindsight dead. Use it as benchmark/control for temporal questions. If OpenViking fails “what changed when and what did we learn?” tests, switch to Hindsight despite the extra LLM cost. The crowbar works until it doesn’t.

Hard blocker before adoption: AGPLv3 review for OpenViking. If license posture is unacceptable, Hindsight becomes default recommendation.

10

Pilot acceptance tests

Functional

  • Freeman, Motoko, Tank all connect to same server from separate machines.
  • Each agent has isolated identity but shared Lambda resources.
  • Query “what did Motoko learn about cron/model fallback?” returns right pattern with source path.
  • Query “what changed after Stevie corrected dual-memory recommendation?” returns correction.
  • Bad memory can be found, edited/deleted, and absence verified.

Economic / ops

  • Average injected memory context < 800 tokens per routed user turn.
  • No auto recall per tool call.
  • Write-side model calls visible in logs/metrics.
  • Dashboard or TUI shows retrieval trajectory.
  • Failure mode degrades to “no memory,” not broken agent loop.
11

Sources inspected

Hermes local code: plugins/memory/hindsight/README.md, plugins/memory/hindsight/__init__.py, plugins/memory/openviking/README.md, plugins/memory/openviking/__init__.py. Upstream docs: vectorize-io/hindsight README; volcengine/OpenViking README/PyPI; OpenViking architecture, context layers, storage, extraction, retrieval, multi-tenant, metrics, deployment, observability, Hermes integration docs. Secrets redacted/omitted.