Brainstorm · operating modes

How containerised coding agents actually run

Eight operating modes for running claude / codex / copilot inside a container (HolyClaude or alt). For each: container topology, what works, what doesn't, when to use, agent compat. Then the cross-cutting capability matrix, Stevie's direct questions answered, and an honest list of what can't be done at all.

Date2026-05-15 Forainb bossmode refactor BackendHolyClaude (pinned :v1.2.2) Pre-design · for discussion

TL;DR · five things to internalise

Container ≠ session. One HolyClaude container can host many short-lived claude -p processes (or hold one long-lived claude REPL). The container is the sandbox; the process is the agent.
CloudCLI UI is optional. Each container ships one CloudCLI on internal :3001. If you spawn N containers you have N CloudCLI instances. Map host ports as you see fit — or map none and use docker exec only.
Session resume works if session files persist. claude -c / --resume read ~/.claude/projects/<hash>/<uuid>.jsonl. Bind-mount the auth tree → resume works across container restarts.
Cross-agent context sharing is impossible without a translation layer. claude and codex have different session formats and don't read each other's files.
"Bossmode" is one mode among many. Today it's mode 1 (fire-and-forget). The refactor is a chance to enable modes 2–8 without breaking mode 1.

Eight operating modes

Each card: a topology sketch (host · containers · workspace mount · who-talks-to-who) · what works · what doesn't · when to reach for it · which agents support it. Read them as a spectrum from ephemeral & cheap (mode 1) to long-lived & complex (modes 5–7).

01 · Fire-and-forget

Fire-and-forget

Ephemeral container per task. claude -p "$PROMPT" runs, JSON streams out, process & container exit. Current ainb bossmode.

Works

Single-prompt task with structured JSON output
Strong per-task isolation — crash one, others unaffected
No state leakage between tasks
Easy to test (deterministic lifecycle)

Doesn't work

No follow-up — container gone before you can attach
~3GB image pull on cold start hurts first-task latency
Can't resume previous session — no persistent claude context
Wastes the CloudCLI UI (gone before you'd open it)

use whenscripted one-shot tasks · CI · "explain this file" / "fix this bug" / "write tests for X"

✓claude

✓codex

⚠copilot

02 · Sticky single-task

Sticky single-task

Fire-and-forget, but container survives after claude -p exits. User can docker exec -it claude -c to TTY-attach the same session — or just hand-inspect logs.

Works

Read logs / status long after task finished
Attach interactive REPL post-hoc (claude -c)
Open CloudCLI UI for a forensic browse
Resume the exact session that ran (jsonl is still there)

Doesn't work

Containers accumulate — need a reaper or explicit kill
Disk + memory cost grows with each kept-alive task
Auth tree shared if multiple sticky containers live concurrently — one bad token can cascade

use whenyou might want to follow up · forensic debugging · "what did it do exactly?"

✓claude

✓codex

⚠copilot

03 · Interactive REPL

Interactive REPL

Long-lived claude (no -p) in TTY. User drives via docker exec -it or via CloudCLI browser UI on :3001. Replaces ainb's current tmux Interactive mode.

Works

Native multi-turn conversation
User can interrupt / steer mid-response
Tool approval prompts (no --dangerously-skip-permissions needed)
CloudCLI UI in browser if the user wants it

Doesn't work

Hard to programmatically inject prompts (TTY semantics)
No structured JSON output — ainb can't parse events live
Container locks one host port (per-container :3001 mapping)

use whenexploratory work · pair programming · agent assists you in real-time

✓claude

✓codex

✓copilot

04 · Multi-turn programmatic

Multi-turn programmatic

Long-lived container, ainb drives via repeated docker exec claude -p calls — each new prompt passes --resume <session-id> to keep context. Structured output, no TTY.

Works

Multi-turn conversation driven by ainb (orchestrate the chat)
Structured JSON output preserved — live event parsing
Cheap turns (no container restart) once container is up
Session jsonl persists, restartable across machine reboots

Doesn't work

No mid-turn interrupt — each claude -p is atomic
Context window still bounded — long sessions need compaction strategy
Hardcoded prompt prefixes (current bossmode) bloat every turn

use whenautonomous task with planned multi-step flow · ainb-driven dialog · agent-as-API

✓claude

⚠codex

✗copilot

05 · Babysitter / autonomous loop

Babysitter loop

Agent receives a goal, self-loops with periodic status reports to ainb. Maps to /loop + cloud-coding-agent patterns. Human can interrupt; agent owns the next-action decision.

Works

Long-running goals without human babysitting
Status visibility via periodic JSON heartbeats
Interrupt-able from ainb (kill, pause, send new instructions)
Cost-cap-able (ainb stops when budget exceeded)

Doesn't work

No native loop primitive in claude CLI — ainb has to define the protocol
Stall detection is hard (agent thinking vs agent stuck)
Cost runaway without a hard cap; /loop guardrails apply
Reproducibility low — same goal can produce different paths

use whenfix-this-issue-end-to-end · long refactor · "build me a goal" / set-and-forget

✓claude

✓codex

⚠copilot

06 · Parallel tasks · one container

Parallel tasks · one container

N concurrent docker exec claude -p calls inside the same HolyClaude container. Matches HolyClaude's own design. Shared auth + workspace tree.

Works

Spawn extra tasks for free (no container cold start)
Lower memory footprint than N containers
Matches HolyClaude's own one-container-per-user model

Doesn't work

Workspace race conditions if two tasks edit the same file
Shared CLAUDE.md memory — tasks pollute each other
Auth rate-limits collapse onto one Max plan account
Kill one task ≠ kill the others — careful PID management

use whenmultiple read-only tasks · "review these 5 files in parallel" · independent sandboxes within one project

✓claude

✓codex

⚠copilot

07 · Multi-agent comparison

Multi-agent comparison

Same prompt → claude + codex + copilot in parallel (separate processes or separate containers). Diff the outputs. Useful for "which agent solves this better" + cross-validation.

Works

Side-by-side agent quality comparison
Pick best output without committing upfront to one agent
True isolation — separate containers, no cross-contamination

Doesn't work

3× cost (3 LLM calls + 3 containers)
3× workspace state — diff merge is a UX problem
No native cross-agent context sharing (each starts fresh)
Voting / consensus needs an orchestration layer ainb doesn't have

use whenhigh-stakes change · benchmark / eval runs · "let's see what each thinks"

✓claude

✓codex

✓copilot

08 · Headless CI

Headless CI

No CloudCLI UI exposed, no TTY, no human in loop. claude -p runs · exit code is the signal · stdout is the artifact. Built for build agents.

Works

Reproducible, exit-code-driven automation
API-key auth via env (no browser OAuth needed)
No port mapping = smaller attack surface
Pin to a Claude model version for deterministic-ish CI

Doesn't work

Most CI runners can't grant SYS_ADMIN / seccomp=unconfined — HolyClaude's Chromium will fail
No recovery path if auth lapses mid-run
Stream parsing for live dashboards harder (no --verbose for log volume)

use whenPR-triggered tasks · scheduled rollouts · code-burn audits · machine workflows

✓claude

✓codex

✓copilot

Capability matrix

What each mode supports across the dimensions Stevie's questions touch. Yes · Partial · No · N/A.

Mode	Resume previous conversation	Switch agent mid-flight	Live JSON stream	File-diff preview during exec	Port-forward dev server	CloudCLI UI accessible	Multi-prompt continuity	Survive container restart	Cost trackable
01 · Fire-and-forgetclaude -p, ephemeral	No	No	Yes	Partial	No	No	No	If jsonl persisted	Yes
02 · Sticky single-tasksurvive after exit	Yes	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes
03 · Interactive REPLtty, long-lived	Native	No	No (TTY)	Yes	Yes	Yes	Native	Yes	Partial
04 · Multi-turn programmaticrepeated -p --resume	Yes	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes
05 · Babysitter loopself-driving	Yes	No	Partial	Yes	Yes	Yes	Yes	Yes	Required (cap)
06 · Parallel · one containershared workspace	Per-task	No	Yes	Race risk	Single port	One UI	Per-task	Yes	Yes
07 · Multi-agent comparisonclaude+codex+copilot	Per-agent	Per-container	Yes	3-way diff	3 ports	3 UIs	Per-agent	Yes	3× cost
08 · Headless CIno UI, exit-code	If jsonl in artifact	No	Yes	No	No	N/A	No	Cache-dependent	Yes

Stevie's questions answered

Direct answers, with which modes apply. No ducking.

If we have multiple containers, do we get multiple CloudCLI UIs?

Yes — one CloudCLI per HolyClaude container. Each container ships a CloudCLI server bound to its internal :3001. To reach it from the host, you must map a unique host port per container — :3001, :3002, :3003, … — or rotate which container holds :3001 at any moment, or skip CloudCLI entirely and use docker exec.

For ainb: the cleanest default is no CloudCLI exposure (use docker exec + JSON streaming), with an opt-in flag (ainb container ui) that picks an unused host port and tells the user where to point their browser. CloudCLI becomes a forensic / interactive escape hatch, not the primary UX.

Modes affected: 02, 03, 06, 07 (UI-relevant). Modes 01, 04, 05, 08 typically skip the UI.

Can I issue something else in that worktree while a task runs?

Yes — depends on which mode. In mode 6 (parallel-in-one-container), just docker exec a second claude -p. In mode 7 (multi-container), spawn a second container with the same workspace bind mount. In mode 1 (fire-and-forget ephemeral), you'd have to wait or spawn a peer container in parallel. Mode 3 (REPL) blocks because the user is mid-interaction with one process.

Warning: concurrent tasks against the same worktree files will race. If both tasks edit src/foo.rs, last-writer-wins. Mitigation: per-task worktrees (git worktree branch-per-task) or read-only mounts for non-mutating tasks.

Modes that support it cleanly: 06 (parallel-in-one), 07 (multi-container). Mode 03 blocks (one REPL = one user attention).

How do I look through status / result / whole execution after the fact?

Three layers, depending on retention policy.

Live: ainb TUI tails the container's stdout (current bossmode pattern, parses stream-json). Post-hoc, container alive: docker logs <id>, or open CloudCLI UI on the container's :3001 mapping. Post-hoc, container gone: persisted logs at ~/.agents-in-a-box/holyclaude/logs/<session-id>.jsonl if ainb saves them on container teardown — and the session jsonl files at ~/.agents-in-a-box/holyclaude/claude/projects/<hash>/<uuid>.jsonl are always recoverable via the bind mount.

Interactive review: docker exec -it <id> claude -c opens the exact same session in a TTY for follow-up questions — works as long as the container is alive. Mode 2 (sticky) keeps this option open by design.

Best supported by: 02 (sticky) for live forensic access · 08 (CI) for artifact-driven review.

Can I continue conversation from a previous session?

Yes, with the right plumbing. claude CLI supports claude -c (continue most recent in cwd) and claude --resume <session-id> (specific session). Both read jsonl from ~/.claude/projects/<project-hash>/<uuid>.jsonl. Because HolyClaude bind-mounts ~/.claude from ./data/claude/ (in ainb's case: ~/.agents-in-a-box/holyclaude/claude/), these files persist across container restarts.

For ainb: surface a "resume this session" affordance in the TUI's session list. The trick is mapping a session UUID back to a human-readable label (current bossmode uses worktree branch + first prompt as the label — keep that).

Best supported by: 02 (sticky), 03 (REPL), 04 (programmatic), 05 (babysitter). Mode 01 can do it too if jsonl persisted.

Can I resume the same context after fire-and-forget exits?

Yes, as long as the jsonl survives. In mode 1, the container is gone but the bind-mounted jsonl is on host disk. Spawn a new container with the same bind mount, claude --resume <uuid>, and you pick up exactly where the previous run left off.

This is the bridge from mode 1 → mode 4 (multi-turn programmatic). The very same session can be "fire-and-forget today, follow up tomorrow" without changing mode operationally — just spawn a fresh container against the persisted state.

Architectural impact: ainb must record the session-id from each run (it's in the stream-json output's init event) so it can pass it back on resume.

After fire-and-forget, can I start an interactive session on top to give more instructions?

Yes — two clean paths.

Path A (recommended for ainb): Spawn a fresh container with the persisted bind mount, run claude -c in TTY via docker exec -it (mode 3 lifted onto the bones of mode 1). Context resumes, user types follow-up, agent responds. Closes mode 3 when user exits TTY.

Path B: Use mode 2 (sticky) from the start — never let the original container exit. Then docker exec -it <id> claude -c on the still-running container. Faster (no cold start) but commits to the "container keeps running" cost.

UX hook: "Continue this task" button in the TUI's session list — defaults to Path A, falls back to Path B if the container is still alive.

What we can't do at all

Honest constraints. These aren't trade-offs to optimise — they're walls. Don't promise these in the UI.

True conversation forking — branching a mid-conversation agent state into two divergent paths needs CRIU-style process snapshotting of the claude CLI's in-memory state. Docker checkpoints exist but are flaky and don't preserve LLM connection state. Closest available: clone the jsonl, restart in two parallel containers, accept that the two paths see the world from slightly different starting points.
Cross-agent session sharing — claude can't read codex's session file (and vice versa). Different schema, different storage location, different mental model. To use claude's context in codex, ainb would need a translation layer that exports claude's jsonl to plain markdown / chat-format and feeds it as a fresh prompt to codex. Lossy by definition.
Hot-swap container mid-task without losing state — Docker has no live-migrate for arbitrary processes. If you want to upgrade the HolyClaude image while a task runs, the task dies. Wait for idle, then swap.
Multi-agent voting / consensus — mode 7 fans out, but the "pick the best" or "merge into one" step needs an orchestrator ainb doesn't have. Could be a future plugin (mode 7 + a judge LLM); not free.
True streaming pause / resume mid-prompt — claude -p has no pause primitive. The closest: kill the process, lose the partial output, resume via --resume from the last committed turn. Interactive mode 3 lets the user interrupt with Ctrl-C, but that's a kill, not a pause.
Native cost cap inside claude CLI — claude doesn't expose a "stop after N tokens" hard limit. Ainb has to enforce caps externally by watching usage events from stream-json and killing the container if budget exceeded. Crude but workable.
Headless OAuth refresh inside a CI runner — if the bind-mounted token expires mid-run, the user (a human with a browser) is required to re-auth. CI uses ANTHROPIC_API_KEY for exactly this reason. Don't try to OAuth in CI.