Captain

Multi-Agent Task Orchestration for LLMs

Navigate: Space / Arrow Keys | Fullscreen: F | Overview: O

From Single to Multi-Agent

Single Agent

🤖

One LLM doing everything sequentially

📚 Reads entire codebase
📝 Plans implementation
💻 Writes all code
✅ Runs tests
🔧 Fixes issues
💾 Commits changes

⚠ Hits context limits on large projects

Multi-Agent (Captain)

🤖🤖🤖

Specialized agents working in parallel

🔮 Planner researches & decomposes
⚙ Worker 0 implements task A
⚙ Worker 1 implements task B (parallel!)
⚙ Worker 2 implements task C (parallel!)
🔍 Validator reviews & merges

✓ Smaller contexts • Parallel execution • Specialized focus

Understanding LLM Execution

LLMs operate in a turn-based execution model, similar to a game of chess or a D&D session.

♛

Chess

White moves → Black moves → White moves...
You can't move while opponent is thinking

🎲

D&D

Player declares action → DM resolves → Next turn...
You act, then wait for the world to respond

🧠

Think

Analyze context

→

⚙

Act

Call a tool

→

👁

Observe

Process result

↺

Each cycle is one turn. The LLM is blocked while tools execute.

↓ scroll down for details ↓

Anatomy of a Turn

1 LLM: "I need to read the user model file" → calls Read tool

2 Tool returns: 150 lines of TypeScript → LLM analyzes

3 LLM: "I'll add a validate() method at line 47" → calls Edit tool

4 Tool returns: "File updated successfully" → LLM continues

5 LLM: "Now let's run the test suite" → calls Bash tool

Complex tasks can require 30-50+ turns. Each turn adds to context and costs tokens.

Turn Limits & Session Resume

To prevent runaway agents and control costs, Captain enforces max turns per task:

Agent Type	Default Max Turns	Typical Use
Planner	30-35	Deep codebase exploration, task decomposition
Worker	50	Implementation, testing, iteration
Validator	10-15	Code review, merge conflict resolution

Session Resumption

When an agent hits max turns, it doesn't fail - it saves its session ID and requeues the task. On next attempt, it resumes with full conversation history intact.

if (err instanceof MaxTurnsError) {
  await taskQueue.updateMetadata(task.id, {
    resumeSessionId: err.sessionId,  // Claude SDK session
    lastTurnsUsed: err.turnsUsed,
  });
  await taskQueue.requeue(task.id, "max_turns", 5);
}

The Agent Fleet

Captain orchestrates four specialized agent types, each with distinct responsibilities:

Coordinator

Planner

Worker

Validator

↓ scroll down for each agent ↓

Coordinator

The entry point for all user requests. Receives high-level tasks and creates the initial epic for planning.

Parses user intent from task description
Creates epic with appropriate metadata
Monitors session-level progress
Single instance per session

Example Flow

# User submits task
captain add "Add user authentication"

# Coordinator creates epic
{
  id: "task-abc123",
  type: "epic",
  title: "Add user authentication",
  status: "pending",
  priority: 2
}

Planner

The architect that researches the codebase and decomposes epics into concrete, implementable tasks.

Deep codebase exploration using semantic search
Identifies existing patterns and conventions
Creates tasks with clear acceptance criteria
Defines dependencies between tasks
Groups related tasks for worker affinity

Task Decomposition Output

{
  "tasks": [
    {
      "title": "Create User model",
      "type": "task",
      "group": "auth-models",
      "acceptance": "User model with email, hash"
    },
    {
      "title": "Add JWT middleware",
      "depends_on": ["Create User model"],
      "group": "auth-middleware"
    },
    {
      "title": "Create login endpoint",
      "depends_on": ["Add JWT middleware"]
    }
  ]
}

Worker

The implementation engine. Multiple workers run in parallel, each in an isolated git worktree.

Claims tasks from the queue
Works in isolated git worktree
Writes code, tests, documentation
Commits changes to worker branch
Submits work for validation

Related tasks share a groupId - they're claimed together by the same worker and batch-validated.

Worktree Isolation

.captain/worktrees/
├── worker-0/   # Full repo checkout
├── worker-1/   # Full repo checkout
└── worker-2/   # Full repo checkout

Nested Workers

When a worker hits its turn limit, it spawns a nested worker to continue:

worker-0
└── worker-0-nested-1768862638663
    └── worker-0-nested-...-nested-...

Same worktree, fresh context. Task continues seamlessly with session resumption.

Validator

The quality guardian. Reviews completed work, handles merges, and extracts learnings.

Reviews code changes for correctness
Checks against acceptance criteria
Merges approved work to feature branch
Resolves merge conflicts intelligently
Extracts discoveries for ChunkHound
Batch validation for efficiency

Validation Decision

{
  "decision": "approve",
  "reasoning": "Implementation matches spec...",
  "discoveries": [
    {
      "type": "pattern",
      "content": "Auth middleware uses...",
      "files": ["src/middleware/auth.ts"]
    }
  ],
  "suggestions": []
}

System Architecture

Task Flow

"Add user authentication"

Coordinator

Creates epic in queue

↓

Planner

Researches → creates 5 subtasks

↙ ↓ ↘

Parallel implementation in worktrees

↘ ↓ ↙

Validator

Reviews → merges to feature branch

↓

Complete

Ready for PR

Infrastructure

Redis

Task queue, pub/sub events, session state. All agents communicate through Redis.

ChunkHound

MCP server for semantic code search. Planner queries to understand codebase patterns.

Beads

Git-backed issue tracking. Persists task state across sessions and context resets.

Git Worktrees

Each worker gets isolated checkout. Parallel file edits without conflicts.

↓ scroll down for detailed data flow ↓

Data Flow

1 User submits task → Coordinator creates epic in Redis queue

2 Planner claims epic → Queries ChunkHound → Creates subtasks with dependencies

3 Workers claim tasks → Each works in isolated worktree → Commits to worker branch

4 Worker publishes completion → PubSub notifies Validator → Queued for review

5 Validator reviews → Extracts discoveries → Merges to feature branch

6 Beads syncs → Issue status updated → State persisted to git

All state is recoverable: Redis for runtime, Beads for persistence, Git for code. Session can resume after crashes.

Task Dependencies

The planner creates a dependency graph. Tasks execute in parallel when their blockers are resolved.

Phase 1

A: Create Schema

worker-0

↙ ↘

Phase 2

B: User Model

C: Auth Service

worker-0, worker-1

↘ ↙

Phase 3

D: Login Endpoint

worker-0

↓

Phase 4

E: Integration Tests

worker-1

↓ scroll down for details ↓

How Dependencies Work

Planner Output

{
  "tasks": [
    {
      "id": "A",
      "title": "Create Schema",
      "depends_on": []
    },
    {
      "id": "B",
      "title": "User Model",
      "depends_on": ["A"],
      "group": "models"
    },
    {
      "id": "C",
      "title": "Auth Service",
      "depends_on": ["A"],
      "group": "auth"
    },
    {
      "id": "D",
      "title": "Login Endpoint",
      "depends_on": ["B", "C"]
    }
  ]
}

Execution Rules

No Dependencies = Ready

Task A has no blockers, immediately claimable by any worker

Parallel When Unblocked

B and C both wait for A. Once A completes, both become ready simultaneously

Multiple Dependencies = AND

D requires both B AND C. Waits for the slower one to finish

Groups = Same Worker

Tasks with same group are claimed together, share context

Beads Integration

📜

Git-Backed Issue Tracking

Beads provides persistent issue tracking that survives conversation compaction and context resets.

What Beads Does

Tracks tasks, bugs, features as issues
Maintains dependency graph
Stores in .beads/issues.jsonl
Git-versioned for history & sync
Survives LLM context compaction
Labels for categorization & filtering

Captain + Beads

Tasks auto-create beads issues
Status syncs: in_progress, completed
Dependencies tracked in both systems
Subtasks linked to parent epics
Labels propagate: backend, auth
Session state persists across runs

↓ scroll down for workflow details ↓

Beads: Creating Issues

# Captain creates a task with labels
captain add "Implement user authentication" --labels=backend,security

# Beads issue is created automatically with full metadata
bd show captain-42

# Output:
# ┌─────────────────────────────────────────────────────────────┐
# │ captain-42: Implement user authentication                   │
# ├─────────────────────────────────────────────────────────────┤
# │ Status:    in_progress                                      │
# │ Type:      epic                                             │
# │ Priority:  P2 (medium)                                      │
# │ Labels:    backend, security                                │
# │ Created:   2024-01-15 10:30:00                             │
# │                                                             │
# │ Blocks:    captain-50, captain-51 (downstream tasks)        │
# │ Subtasks:  captain-43, captain-44, captain-45               │
# └─────────────────────────────────────────────────────────────┘

Issues support priority levels (P0-P4), labels, dependencies, and rich metadata.

Beads: Viewing Dependencies

# View all issues with dependency tree
bd list --all --pretty

# Output with visual dependency graph:
# ┌──────────────────────────────────────────────────────────────────────────────┐
# │ ID           Status        Title                              Labels        │
# ├──────────────────────────────────────────────────────────────────────────────┤
# │ captain-42   in_progress   Implement user authentication      backend       │
# │ ├─ captain-43   done       Create User model                  backend,db    │
# │ ├─ captain-44   done       Add password hashing               backend       │
# │ ├─ captain-45   working    Implement JWT middleware           backend,auth  │
# │ │  └─ captain-46 blocked   Create login endpoint              backend,api   │
# │ │     └─ captain-47 pending  Add session management           backend       │
# │ └─ captain-48   pending    Write auth tests                   test          │
# │                                                                              │
# │ captain-50   blocked       Add user profile page              frontend      │
# │   └─ (blocked by captain-42)                                                │
# └──────────────────────────────────────────────────────────────────────────────┘
#
# Legend: done=green, working=cyan, pending=gray, blocked=yellow

The tree view shows task hierarchy and blocking relationships at a glance.

Beads: Workflow Commands

Finding Work

# What's ready to work on?
bd ready
# captain-48: Write auth tests
#   (all blockers resolved)

# Filter by label
bd list --labels=backend --status=open

# See what's blocked
bd blocked
# captain-46: blocked by captain-45
# captain-50: blocked by captain-42

Managing Dependencies

# Add a dependency
bd dep add captain-47 captain-46
# captain-47 now depends on captain-46

# Close completed work
bd close captain-45 --reason="JWT impl done"

# Sync to git (persists state)
bd sync --flush-only
# Exported 8 issues to .beads/issues.jsonl

Beads persists to git. Even if Captain crashes or the LLM context resets, issue state is preserved.

ChunkHound Integration

🐕

Semantic Code Search via MCP

ChunkHound is an MCP server that provides embedding-based code search, enabling agents to find code by meaning.

MCP Architecture

Runs as MCP (Model Context Protocol) server
Exposes tools to Claude SDK agents
Indexes codebase into semantic chunks
Creates embeddings via OpenAI/Voyage
Stores in local SQLite + vector DB

Available MCP Tools

search_semantic - find by meaning
search_regex - exact pattern match
code_research - deep analysis
get_stats - index statistics
health_check - server status

↓ scroll down to see MCP server in action ↓

MCP Server: Tool Calls

When an agent needs to understand existing code, the Claude SDK makes MCP tool calls:

// Agent: "I need to find how authentication is implemented"
{
  "tool": "mcp__ChunkHound__search_semantic",
  "input": { "query": "user authentication middleware JWT", "page_size": 5 }
}

// ChunkHound responds with ranked results:
{
  "results": [
    { "file": "src/middleware/auth.ts", "score": 0.92, "lines": "45-52",
      "chunk": "export async function verifyToken(req)..." },
    { "file": "src/utils/tokens.ts", "score": 0.87,
      "chunk": "export function generateAccessToken(user)..." }
  ]
}

How semantic search works

Query Embedding

Convert query to vector using OpenAI/Voyage

Vector Search

Find nearest neighbors in code index

Rank by Score

Order by cosine similarity (0-1)

Return Chunks

Code snippets with file locations

MCP Server: Research Mode

For complex questions, agents use the code_research tool for deep analysis:

// Planner needs to understand the entire auth system architecture
{
  "tool": "mcp__ChunkHound__code_research",
  "input": {
    "query": "How does the authentication system work? What are the main components?"
  }
}

// ChunkHound performs multi-step analysis and returns markdown report:
{
  "analysis": "## Authentication System Architecture\n\n### Components\n1. **JWT Middleware** - Validates Bearer tokens...\n2. **Token Service** - Generates access/refresh tokens...\n3. **Login Endpoint** - POST /api/login...\n\n### Data Flow\nRequest → auth middleware → verify JWT → attach user → handler\n\n### Related Files\n- src/models/User.ts, src/config/jwt.ts, tests/auth.test.ts"
}

What happens under the hood

Semantic Search

Query embedded, top chunks retrieved

Context Expansion

Related files and imports followed

LLM Synthesis

Chunks analyzed, report generated

Markdown Output

Structured answer with file refs

MCP Server: Regex Search

Exact Pattern Matching

// Find all usages of a specific function
{
  "tool": "mcp__ChunkHound__search_regex",
  "input": {
    "pattern": "verifyToken\\(",
    "path": "src/",
    "output_mode": "content"
  }
}

// Response:
{
  "matches": [
    "src/middleware/auth.ts:47: verifyToken(req)",
    "src/routes/profile.ts:12: verifyToken(ctx.req)",
    "src/routes/settings.ts:8: verifyToken(ctx.req)"
  ],
  "count": 3
}

Discovery Persistence

// Validator extracts patterns for indexing
{
  "discovery": {
    "type": "pattern",
    "content": "Auth uses Bearer tokens with 15min expiry. Refresh tokens stored in httpOnly cookies.",
    "files": [
      "src/middleware/auth.ts",
      "src/utils/tokens.ts"
    ]
  }
}

// Future agents can find this via:
// "how does token refresh work?"

Discoveries are indexed as embeddings - future agents find them via semantic search.

Configuration

# captain.yaml
project:
  name: my-project
  baseBranch: main

redis:
  url: redis://localhost:6379

planners:
  count: 1          # Usually 1 is enough

workers:
  count: 3          # Parallel workers
  maxNestedDepth: 2 # Subtask depth limit

validators:
  count: 1          # Usually 1 is enough

# Per-agent LLM configuration
planner:
  model: claude-sonnet-4-20250514
  maxTurns: 35

worker:
  model: claude-sonnet-4-20250514
  maxTurns: 50

# Integrations
beads:
  enabled: true
  syncOnComplete: true

chunkhound:
  enabled: true
  persistDiscoveries: true

CLI Usage

Starting a Session

# Start with initial task
captain start \
  --tui \
  --task "Add user auth"

# Start without initial task
captain start --tui

Adding Tasks

# Add to running session
captain add "Implement logout"
captain add "Add password reset"

Monitoring

# Attach to running session
captain attach

# View session status
captain status

TUI Controls

Tab      - Switch panels
Enter    - Fullscreen panel
j/k      - Scroll
a/w/v/p  - Filter logs (fullscreen)
q        - Quit

Real-Time Dashboard

Captain Dashboard Session: a1b2c3d4 Branch: feat/user-auth Uptime: 12m 34s

[1] Agents

5 active, 2 working

C coordinator-0

P planner-0

W worker-0 *

W worker-1 *

W worker-2

V validator-0

[2] Tasks 2P 2W 3D

● Create User model

● Add password hashing

◔ Implement JWT middleware

◔ Create login endpoint

○ Add session management

○ Write auth tests

Active Work

[worker-0]

JWT middleware

[worker-1]

[3] Logs

[worker-0] [Turn 12] Reading src/middleware/auth.ts

[worker-1] [Turn 8] Creating POST /api/login endpoint

[validator] Reviewing batch: worker-2 (Create User model)

[validator] Approved: Create User model - merging

Tab panels | Enter fullscreen | j/k scroll | a/w/v/p filter logs | q quit

Live Session Replay

Captain implementing its own features from a PRD gap assessment

5.5

Hours

179

Tasks Completed

Commits

$133

Total Cost

+24,920 lines added

-866 lines removed

2,534 turns

Press Down for session breakdown

Session Breakdown

Agent Fleet

Coordinator:1 instance Planner:1 instance Workers:3 parallel instances Validators:1 instance

Validation Stats

Approvals:19 batches Rejections:4 batches Auto-approvals:96 (no changes) Merges:11 to main branch

Task Distribution

Completed179

Failed2

Remaining2

Efficiency

Avg cost/task:$0.74 Avg turns/task:14.2 Tasks/hour:32.5 Success rate:98.9%

Interactive Timeline

Explore the full 5.5-hour session with real-time visualization

▶

Playback Control

1x to 500x speed

🔍

Event Filtering

By type: failed, merged, etc.

⌨

Keyboard Shortcuts

Space, arrows, R

📊

Live Stats

Per-worker metrics

■ Task starts

■ Completions

■ Failures with reasons

■ Branch merges

■ Validations

Launch Interactive Replay

Tip: Click events to expand details • Drag timeline to scrub • Filter by event type

The Cybernetic Foundation

"Cybernetics: the study of control and communication in the animal and the machine"

— Norbert Wiener, 1948

From Greek κυβερνήτης (kybernētēs): steersman, governor, pilot

Core principle: feedback loops that allow systems to self-regulate and achieve goals despite changing conditions.

The Ship Metaphor

⛵

In steering a ship, the rudder position is adjusted in continual response to observed effects—forming a feedback loop through which a steady course is maintained despite cross winds and tide.

The steersman doesn't fight the sea—they respond to it.

Captain as a Cybernetic System

↺ Feedback Loops

Validator approves/rejects → Worker learns
Task fails → Retry with fresh context
Turn limit hit → Spawn nested worker
Usage limit → Pause and resume

⚙ Self-Regulation

Automatic crash recovery
Session state persistence
Worker respawning
Queue rebalancing

🎯 Goal-Seeking

Tasks have acceptance criteria
System converges on completion
Dependencies unlock progress
Coordinator steers toward done

Like Wiener's steersman, the Coordinator doesn't fight failures—it responds to them.

Error Handling & Resilience

Captain embraces failure as feedback. From the live session: 93% recovery rate from failures.

Failure Types Handled

⚠Usage Limits — API rate limits, token exhaustion ✖Process Crashes — Worker unexpectedly exits ↻Stale Sessions — Conversation context lost ⚠Git Conflicts — Merge failures ✔Max Turns — Task needs more work

Recovery Strategy

Detect failure type from exit code/error
Persist current state to Beads
Requeue task with attempt counter
Spawn fresh worker with session resume
Continue from last checkpoint

Live Session Stats

Retry Events63

Recovered18 tasks

Permanent Failures13 tasks

4 retry attempts per task before marking as failed

Most failures were usage limits (API throttling). System waited and resumed automatically.

Key Concepts Summary

Turn-Based

LLMs work in turns: think, act, observe. Captain manages turn limits and enables session resumption.

Parallel Workers

Multiple workers in isolated worktrees. Tasks without dependencies run simultaneously.

Smart Planning

Planner uses semantic search to understand codebase before decomposing epics into tasks.

Task Dependencies

Explicit dependency graph ensures correct ordering while maximizing parallelism.

Quality Gates

Validator reviews all work before merge. Batch validation for efficiency.

Persistence

Beads tracks issues in git. ChunkHound indexes discoveries. State survives restarts.

Captain

Multi-Agent Task Orchestration

Redis + Claude SDK + Git Worktrees + Beads + ChunkHound

Questions?