Captain

Multi-Agent Task Orchestration for LLMs

Navigate: Space / Arrow Keys | Fullscreen: F | Overview: O

From Single to Multi-Agent

Single Agent

One LLM instance handling everything:

  • Reads entire codebase
  • Plans the implementation
  • Writes all the code
  • Runs tests
  • Fixes issues
  • Commits changes

Works great for focused tasks. Hits limits on large projects.

Multi-Agent (Captain)

Specialized agents with focused responsibilities:

  • Planner researches & decomposes
  • Workers implement in parallel
  • Validator reviews & merges

Smaller contexts. Parallel execution. Specialized focus.

Understanding LLM Execution

LLMs operate in a turn-based execution model, similar to a game of chess or a D&D session.

Chess

White moves → Black moves → White moves...
You can't move while opponent is thinking

🎲

D&D

Player declares action → DM resolves → Next turn...
You act, then wait for the world to respond

🧠
Think
Analyze context
Act
Call a tool
👁
Observe
Process result

Each cycle is one turn. The LLM is blocked while tools execute.

↓ scroll down for details ↓

Anatomy of a Turn

1 LLM: "I need to read the user model file" → calls Read tool
2 Tool returns: 150 lines of TypeScript → LLM analyzes
3 LLM: "I'll add a validate() method at line 47" → calls Edit tool
4 Tool returns: "File updated successfully" → LLM continues
5 LLM: "Now let's run the test suite" → calls Bash tool
Complex tasks can require 30-50+ turns. Each turn adds to context and costs tokens.

Turn Limits & Session Resume

To prevent runaway agents and control costs, Captain enforces max turns per task:

Agent Type Default Max Turns Typical Use
Planner 30-35 Deep codebase exploration, task decomposition
Worker 50 Implementation, testing, iteration
Validator 10-15 Code review, merge conflict resolution
Session Resumption

When an agent hits max turns, it doesn't fail - it saves its session ID and requeues the task. On next attempt, it resumes with full conversation history intact.

if (err instanceof MaxTurnsError) {
  await taskQueue.updateMetadata(task.id, {
    resumeSessionId: err.sessionId,  // Claude SDK session
    lastTurnsUsed: err.turnsUsed,
  });
  await taskQueue.requeue(task.id, "max_turns", 5);
}

The Agent Fleet

Captain orchestrates four specialized agent types, each with distinct responsibilities:

Coordinator
Planner
Worker
Validator

↓ scroll down for each agent ↓

Coordinator

Coordinator

The entry point for all user requests. Receives high-level tasks and creates the initial epic for planning.

  • Parses user intent from task description
  • Creates epic with appropriate metadata
  • Monitors session-level progress
  • Single instance per session
Example Flow
# User submits task
captain add "Add user authentication"

# Coordinator creates epic
{
  id: "task-abc123",
  type: "epic",
  title: "Add user authentication",
  status: "pending",
  priority: 2
}

Planner

Planner

The architect that researches the codebase and decomposes epics into concrete, implementable tasks.

  • Deep codebase exploration using semantic search
  • Identifies existing patterns and conventions
  • Creates tasks with clear acceptance criteria
  • Defines dependencies between tasks
  • Groups related tasks for worker affinity
Task Decomposition Output
{
  "tasks": [
    {
      "title": "Create User model",
      "type": "task",
      "group": "auth-models",
      "acceptance": "User model with email, hash"
    },
    {
      "title": "Add JWT middleware",
      "depends_on": ["Create User model"],
      "group": "auth-middleware"
    },
    {
      "title": "Create login endpoint",
      "depends_on": ["Add JWT middleware"]
    }
  ]
}

Worker

Worker

The implementation engine. Multiple workers run in parallel, each in an isolated git worktree.

  • Claims tasks from the queue
  • Works in isolated git worktree
  • Writes code, tests, documentation
  • Commits changes to worker branch
  • Submits work for validation
Related tasks share a groupId - they're claimed together by the same worker and batch-validated.
Worktree Isolation
.captain/
├── worktrees/
│   ├── worker-0/   # Full repo checkout
│   │   └── src/
│   ├── worker-1/   # Full repo checkout
│   │   └── src/
│   └── worker-2/   # Full repo checkout
│       └── src/
└── session-state/

Each worker has complete isolation. No file conflicts. Parallel edits.

Validator

Validator

The quality guardian. Reviews completed work, handles merges, and extracts learnings.

  • Reviews code changes for correctness
  • Checks against acceptance criteria
  • Merges approved work to feature branch
  • Resolves merge conflicts intelligently
  • Extracts discoveries for ChunkHound
  • Batch validation for efficiency
Validation Decision
{
  "decision": "approve",
  "reasoning": "Implementation matches spec...",
  "discoveries": [
    {
      "type": "pattern",
      "content": "Auth middleware uses...",
      "files": ["src/middleware/auth.ts"]
    }
  ],
  "suggestions": []
}

System Architecture

Task Flow
"Add user authentication"
Coordinator
Creates epic in queue
Planner
Researches → creates 5 subtasks
↙ ↓ ↘
W0
W1
W2
Parallel implementation in worktrees
↘ ↓ ↙
Validator
Reviews → merges to feature branch
Complete
Ready for PR
Infrastructure
Redis
Task queue, pub/sub events, session state. All agents communicate through Redis.
ChunkHound
MCP server for semantic code search. Planner queries to understand codebase patterns.
Beads
Git-backed issue tracking. Persists task state across sessions and context resets.
Git Worktrees
Each worker gets isolated checkout. Parallel file edits without conflicts.

↓ scroll down for detailed data flow ↓

Data Flow

1 User submits task → Coordinator creates epic in Redis queue
2 Planner claims epic → Queries ChunkHound → Creates subtasks with dependencies
3 Workers claim tasks → Each works in isolated worktree → Commits to worker branch
4 Worker publishes completion → PubSub notifies Validator → Queued for review
5 Validator reviews → Extracts discoveries → Merges to feature branch
6 Beads syncs → Issue status updated → State persisted to git
All state is recoverable: Redis for runtime, Beads for persistence, Git for code. Session can resume after crashes.

Task Dependencies

The planner creates a dependency graph. Tasks execute in parallel when their blockers are resolved.

Phase 1
A: Create Schema
worker-0
Phase 2
B: User Model
C: Auth Service
worker-0, worker-1
Phase 3
D: Login Endpoint
worker-0
Phase 4
E: Integration Tests
worker-1

↓ scroll down for details ↓

How Dependencies Work

Planner Output

{
  "tasks": [
    {
      "id": "A",
      "title": "Create Schema",
      "depends_on": []
    },
    {
      "id": "B",
      "title": "User Model",
      "depends_on": ["A"],
      "group": "models"
    },
    {
      "id": "C",
      "title": "Auth Service",
      "depends_on": ["A"],
      "group": "auth"
    },
    {
      "id": "D",
      "title": "Login Endpoint",
      "depends_on": ["B", "C"]
    }
  ]
}

Execution Rules

No Dependencies = Ready
Task A has no blockers, immediately claimable by any worker
Parallel When Unblocked
B and C both wait for A. Once A completes, both become ready simultaneously
Multiple Dependencies = AND
D requires both B AND C. Waits for the slower one to finish
Groups = Same Worker
Tasks with same group are claimed together, share context

Beads Integration

📜

Git-Backed Issue Tracking

Beads provides persistent issue tracking that survives conversation compaction and context resets.

What Beads Does

  • Tracks tasks, bugs, features as issues
  • Maintains dependency graph
  • Stores in .beads/issues.jsonl
  • Git-versioned for history & sync
  • Survives LLM context compaction
  • Labels for categorization & filtering

Captain + Beads

  • Tasks auto-create beads issues
  • Status syncs: in_progress, completed
  • Dependencies tracked in both systems
  • Subtasks linked to parent epics
  • Labels propagate: backend, auth
  • Session state persists across runs

↓ scroll down for workflow details ↓

Beads: Creating Issues

# Captain creates a task with labels
captain add "Implement user authentication" --labels=backend,security

# Beads issue is created automatically with full metadata
bd show captain-42

# Output:
# ┌─────────────────────────────────────────────────────────────┐
# │ captain-42: Implement user authentication                   │
# ├─────────────────────────────────────────────────────────────┤
# │ Status:    in_progress                                      │
# │ Type:      epic                                             │
# │ Priority:  P2 (medium)                                      │
# │ Labels:    backend, security                                │
# │ Created:   2024-01-15 10:30:00                             │
# │                                                             │
# │ Blocks:    captain-50, captain-51 (downstream tasks)        │
# │ Subtasks:  captain-43, captain-44, captain-45               │
# └─────────────────────────────────────────────────────────────┘
Issues support priority levels (P0-P4), labels, dependencies, and rich metadata.

Beads: Viewing Dependencies

# View all issues with dependency tree
bd list --all --pretty

# Output with visual dependency graph:
# ┌──────────────────────────────────────────────────────────────────────────────┐
# │ ID           Status        Title                              Labels        │
# ├──────────────────────────────────────────────────────────────────────────────┤
# │ captain-42   in_progress   Implement user authentication      backend       │
# │ ├─ captain-43   done       Create User model                  backend,db    │
# │ ├─ captain-44   done       Add password hashing               backend       │
# │ ├─ captain-45   working    Implement JWT middleware           backend,auth  │
# │ │  └─ captain-46 blocked   Create login endpoint              backend,api   │
# │ │     └─ captain-47 pending  Add session management           backend       │
# │ └─ captain-48   pending    Write auth tests                   test          │
# │                                                                              │
# │ captain-50   blocked       Add user profile page              frontend      │
# │   └─ (blocked by captain-42)                                                │
# └──────────────────────────────────────────────────────────────────────────────┘
#
# Legend: done=green, working=cyan, pending=gray, blocked=yellow

The tree view shows task hierarchy and blocking relationships at a glance.

Beads: Workflow Commands

Finding Work

# What's ready to work on?
bd ready
# captain-48: Write auth tests
#   (all blockers resolved)

# Filter by label
bd list --labels=backend --status=open

# See what's blocked
bd blocked
# captain-46: blocked by captain-45
# captain-50: blocked by captain-42

Managing Dependencies

# Add a dependency
bd dep add captain-47 captain-46
# captain-47 now depends on captain-46

# Close completed work
bd close captain-45 --reason="JWT impl done"

# Sync to git (persists state)
bd sync --flush-only
# Exported 8 issues to .beads/issues.jsonl
Beads persists to git. Even if Captain crashes or the LLM context resets, issue state is preserved.

ChunkHound Integration

🐕

Semantic Code Search via MCP

ChunkHound is an MCP server that provides embedding-based code search, enabling agents to find code by meaning.

MCP Architecture

  • Runs as MCP (Model Context Protocol) server
  • Exposes tools to Claude SDK agents
  • Indexes codebase into semantic chunks
  • Creates embeddings via OpenAI/Voyage
  • Stores in local SQLite + vector DB

Available MCP Tools

  • search_semantic - find by meaning
  • search_regex - exact pattern match
  • code_research - deep analysis
  • get_stats - index statistics
  • health_check - server status

↓ scroll down to see MCP server in action ↓

MCP Server: Tool Calls

When an agent needs to understand existing code, the Claude SDK makes MCP tool calls:

// Agent: "I need to find how authentication is implemented"
{
  "tool": "mcp__ChunkHound__search_semantic",
  "input": { "query": "user authentication middleware JWT", "page_size": 5 }
}

// ChunkHound responds with ranked results:
{
  "results": [
    { "file": "src/middleware/auth.ts", "score": 0.92, "lines": "45-52",
      "chunk": "export async function verifyToken(req)..." },
    { "file": "src/utils/tokens.ts", "score": 0.87,
      "chunk": "export function generateAccessToken(user)..." }
  ]
}
How semantic search works
1
Query Embedding
Convert query to vector using OpenAI/Voyage
2
Vector Search
Find nearest neighbors in code index
3
Rank by Score
Order by cosine similarity (0-1)
4
Return Chunks
Code snippets with file locations

MCP Server: Research Mode

For complex questions, agents use the code_research tool for deep analysis:

// Planner needs to understand the entire auth system architecture
{
  "tool": "mcp__ChunkHound__code_research",
  "input": {
    "query": "How does the authentication system work? What are the main components?"
  }
}

// ChunkHound performs multi-step analysis and returns markdown report:
{
  "analysis": "## Authentication System Architecture\n\n### Components\n1. **JWT Middleware** - Validates Bearer tokens...\n2. **Token Service** - Generates access/refresh tokens...\n3. **Login Endpoint** - POST /api/login...\n\n### Data Flow\nRequest → auth middleware → verify JWT → attach user → handler\n\n### Related Files\n- src/models/User.ts, src/config/jwt.ts, tests/auth.test.ts"
}
What happens under the hood
1
Semantic Search
Query embedded, top chunks retrieved
2
Context Expansion
Related files and imports followed
3
LLM Synthesis
Chunks analyzed, report generated
4
Markdown Output
Structured answer with file refs

MCP Server: Regex Search

Exact Pattern Matching

// Find all usages of a specific function
{
  "tool": "mcp__ChunkHound__search_regex",
  "input": {
    "pattern": "verifyToken\\(",
    "path": "src/",
    "output_mode": "content"
  }
}

// Response:
{
  "matches": [
    "src/middleware/auth.ts:47: verifyToken(req)",
    "src/routes/profile.ts:12: verifyToken(ctx.req)",
    "src/routes/settings.ts:8: verifyToken(ctx.req)"
  ],
  "count": 3
}

Discovery Persistence

// Validator extracts patterns for indexing
{
  "discovery": {
    "type": "pattern",
    "content": "Auth uses Bearer tokens with 15min expiry. Refresh tokens stored in httpOnly cookies.",
    "files": [
      "src/middleware/auth.ts",
      "src/utils/tokens.ts"
    ]
  }
}

// Future agents can find this via:
// "how does token refresh work?"
Discoveries are indexed as embeddings - future agents find them via semantic search.

Configuration

# captain.yaml
project:
  name: my-project
  baseBranch: main

redis:
  url: redis://localhost:6379

planners:
  count: 1          # Usually 1 is enough

workers:
  count: 3          # Parallel workers
  maxNestedDepth: 2 # Subtask depth limit

validators:
  count: 1          # Usually 1 is enough

# Per-agent LLM configuration
planner:
  model: claude-sonnet-4-20250514
  maxTurns: 35

worker:
  model: claude-sonnet-4-20250514
  maxTurns: 50

# Integrations
beads:
  enabled: true
  syncOnComplete: true

chunkhound:
  enabled: true
  persistDiscoveries: true

CLI Usage

Starting a Session

# Start with initial task
captain start \
  --tui \
  --task "Add user auth"

# Start without initial task
captain start --tui

Adding Tasks

# Add to running session
captain add "Implement logout"
captain add "Add password reset"

Monitoring

# Attach to running session
captain attach

# View session status
captain status

TUI Controls

Tab      - Switch panels
Enter    - Fullscreen panel
j/k      - Scroll
a/w/v/p  - Filter logs (fullscreen)
q        - Quit

Real-Time Dashboard

Captain Dashboard Session: a1b2c3d4 Branch: feat/user-auth Uptime: 12m 34s
[1] Agents
5 active, 2 working
C coordinator-0
P planner-0
W worker-0 *
W worker-1 *
W worker-2
V validator-0
[2] Tasks 2P 2W 3D
Create User model
Add password hashing
Implement JWT middleware
Create login endpoint
Add session management
Write auth tests
Active Work
[worker-0]
JWT middleware
[worker-1]
Login endpoint
[3] Logs
[worker-0] [Turn 12] Reading src/middleware/auth.ts
[worker-1] [Turn 8] Creating POST /api/login endpoint
[validator] Reviewing batch: worker-2 (Create User model)
[validator] Approved: Create User model - merging
Tab panels | Enter fullscreen | j/k scroll | a/w/v/p filter logs | q quit

Key Concepts Summary

Turn-Based

LLMs work in turns: think, act, observe. Captain manages turn limits and enables session resumption.

Parallel Workers

Multiple workers in isolated worktrees. Tasks without dependencies run simultaneously.

Smart Planning

Planner uses semantic search to understand codebase before decomposing epics into tasks.

Task Dependencies

Explicit dependency graph ensures correct ordering while maximizing parallelism.

Quality Gates

Validator reviews all work before merge. Batch validation for efficiency.

Persistence

Beads tracks issues in git. ChunkHound indexes discoveries. State survives restarts.

Captain

Multi-Agent Task Orchestration

Redis + Claude SDK + Git Worktrees + Beads + ChunkHound

Questions?