Captain vs Build: Why We Split the AI Agent in Two

Most AI coding tools treat planning and execution as one move: you prompt, it writes. That feels fast, until it is not. Capy splits the work into two specialized agents because planning is where quality is won or lost, and a crisp spec is usually the difference between shipping once and rewriting three times.

A dedicated planner forces clarity up front, cuts down wasted iterations, and gives the executor everything it needs to ship clean code in one pass.

"Every ambiguity in your specification becomes a problem in implementation."

Capy TeamFrom the Captain Capy prompt

TL;DR

Single agent systems make humans do the orchestration work.
Captain plans, Build executes. The split forces real specs.
Better specs mean fewer loops and higher quality output.

The bottleneck in single agent coding

Side-by-side comparison: a single agent loop where you prompt, review, and fix 3-5 times versus Capy's two-agent flow where you describe, Captain specs, Build ships in one pass. — Single agent vs Captain + Build

The loop is familiar. You prompt an agent, it writes code, you scan the diff, and you ask for changes. The agent is quick, but it can only follow what you asked for, not what you meant.

Experienced engineers already know the workaround: pause and plan. They skim the codebase, map the constraints, and jot down a spec. In other words, the human does the architecture work before the agent starts typing.

We asked a simple question: what if the system did that orchestration for you?

Meet Captain and Build

Capy uses two distinct agents that work in sequence.

Captain is the technical architect. It reads codebases, researches, asks questions, and writes exhaustive specs, but it never writes production code.

"You are a technical architect who PLANS but never IMPLEMENTS."

Build is the executor. It receives the spec, spins up an Ubuntu VM, edits files, runs commands, and ships the work.

"You are an autonomous AI agent with access to an Ubuntu virtual machine to complete the user's coding and research tasks independently and asynchronously."

	Captain	Build
Role	Architect	Executor
Code	Reads code	Writes code
Research	Explores	Full VM access
Interaction	Asks questions	Ships PRs
Output	Creates specs	Runs commands

The planning step that changes outcomes

Before Build touches a file, Captain runs the same routine a good staff engineer uses at the start of a project - explore the codebase, clarify ambiguities, write a concrete spec, and hand off a zero-guesswork brief.

Captain's planning flow: 1. Explore - scan codebase, find conventions. 2. Clarify - surface ambiguity, ask questions. 3. Specify - files, edge cases, success criteria. 4. Handoff - zero-guesswork brief to Build. — Captain's planning flow

Think of the spec as a short PRD, not a loose prompt. A real example might read: "Add Google and GitHub OAuth. Sessions live in src/auth/session.ts, follow the existing token refresh pattern, rate-limit failed logins, and make sure existing sessions survive re-auth." It names the files, the constraints, and the success criteria so Build has nothing left to guess.

If something is missing from the spec, it shows up later as churn. Captain's job is to make the intent explicit so Build can move straight to execution.

The handoff in practice

Flow diagram showing the handoff from You to Captain to Build. You describe what you want in plain language. Captain explores the codebase, asks questions, writes the spec, and monitors progress. Build edits files, runs commands, commits, pushes, and creates PRs. — The three-phase handoff: You → Captain → Build

You describe the task in plain language - no need to polish it. Captain reads your codebase, identifies the patterns and constraints, and closes the gaps. If something is ambiguous it will ask: "Should failed logins return 401 or 403?" or "Do you want rate limiting on auth endpoints?"

Once the spec is tight, Build takes over. It edits files, installs dependencies, runs tests, and opens a pull request - all autonomously. If Build drifts off track, you can step in through Captain or let Captain follow up on its own.

Guardrails: what each agent can and cannot do

Agent guardrails diagram. Captain can: read files, web search, ask questions, create tasks, start tasks. Captain cannot: edit files, run terminal, git push, create PRs. Build can: read files, edit files, run terminal, git push + PRs, web search. Build cannot: ask questions, create tasks. — What each agent can and cannot do

Captain is intentionally read-only. It can scan any file and search the web, but it never edits code, runs terminals, or pushes commits. It plans, hands off, and gets out of the way. We keep its system prompt and toolset tightly curated for reliability.

Build is the opposite - it gets a full Ubuntu VM with sudo, whatever runtimes you need, and full git access to branch, commit, push, and open PRs. It often touches many files in a single pass and will run the tests itself, see what fails, and fix it. The trade-off is that Build never asks you clarifying questions mid-task. It only sees the spec and the codebase, which is exactly why the spec needs to be good.

Where the split pays off most

The two-agent split pays off most when the work is wide or messy. Feature development like "add real-time notifications" touches the database, backend, WebSocket layer, and UI all at once. Large refactors like migrating from REST to GraphQL require mapping every endpoint and test before any code changes. Bug investigations like "users see stale data" could live anywhere from caching to queries to UI state. In each case Captain narrows the surface area first so Build can move in a straight line.

For tiny edits - renaming a variable, fixing a typo, tweaking a string - you can skip Captain and go straight to Build. The planning step only pays off once tasks are bigger than a quick human edit.

The counterintuitive latency trade

Two agents sound slower on paper.

Timeline comparison: a vague prompt goes through v1, fix, v2, fix, v3, fix, v4 before ok. A specced task has Captain plans then Build executes then ship - finishing faster overall. — A well-specced task finishes faster than a vague prompt that needs multiple corrections

In practice the back-and-forth costs more than a single planning pass. A well-specced task finishes faster than a vague prompt that needs multiple rounds of correction, and because Build runs asynchronously your wall-clock time stays low even when the task itself is long.

Try it

Start a chat, describe what you want, and let Captain explore and spec it. When the plan looks right, start Build and let it ship.

You may find you enjoy the planning step more than you expected. We do.