A living document by TheCrux — Last updated: January 2026
Tool-specific details (Claude Code commands, flags, file locations) are volatile. Where we state specifics, we add a Last verified date. If your installed version differs, prefer the official docs.
What Is This?
A practical guide to programming with AI CLI tools — Claude Code, Codex CLI, Gemini CLI, and others. These terminal-based agents don't just suggest code; they read your codebase, write files, run commands, and iterate based on results.
This guide is written by AI with human editorial oversight. It's maintained using TheCrux, which sources and summarizes practitioner content—blog posts, threads, documentation—and uses multiple AI models to synthesize updates. The goal: surface emerging best practices, highlight where approaches diverge, and help you develop your own informed workflow.
Quick Start by Experience Level
If You're New to AI CLI Tools
Start here: The shift from IDE-based AI (Copilot, Cursor) to terminal-based agents (Claude Code, Codex) represents a fundamental change. You're not asking for code suggestions—you're having a conversation with a partner that can read, write, and execute code in your repo.
Your first session:
- Install your tool of choice (Claude Code, Codex CLI, or Gemini CLI)
- Navigate to a project you know well (familiarity helps you evaluate the output)
- Start with a small, well-defined task: "Add a function that validates email addresses"
- Watch what it does—read the file, make changes, maybe run tests
- Review the diff before accepting
If you start with Claude Code: generate a sane baseline CLAUDE.md first.
/init
Then prune it aggressively (see Create a Project Context File).
What to expect: You'll be slower at first. That's normal. The learning curve is about developing intuition for what tasks agents handle well vs. where they struggle.
These tools are high-leverage but not “push-button magic.” The fastest teams tend to be the ones who adopt a more disciplined loop: plan, work in small steps, verify, commit.
If You're Already Using These Tools
Become more productive by:
- Creating a
CLAUDE.md/AGENTS.md(or equivalent) that encodes your project's conventions - Building custom slash commands for repeated workflows (
/pr,/test,/catchup) - Experimenting with parallel agents on independent tasks
- Adding tests as a feedback mechanism for the agent
Common plateau: Many developers get stuck at "one agent, one task, manual review." The next level is developing workflows where agents can iterate autonomously (via tests) and where you can run multiple agents on parallel work streams.
Steve Yegge's 8 Stages of AI-Assisted Coding
From Welcome to Gas Town:
| Stage | Description |
|---|---|
| 1 | Zero or near-zero AI |
| 2 | Coding agent in IDE with permission prompts |
| 3 | IDE agent, YOLO mode (trust increasing) |
| 4 | IDE agent, using it mainly for reviewing diffs |
| 5 | CLI, single agent |
| 6 | CLI, multi-agent YOLO — regularly use 3-5 parallel instances. "You are very fast" |
| 7 | 10+ agents, hand-managed |
| 8 | Building your own orchestrator |
Most developers reading this guide are probably at stages 2-4. The goal is to help you reach stage 5-6, where productivity gains become substantial.
Why Now: The 2026 Moment
Key developments that changed the game:
- Context windows expanded — up to 200k tokens reduces paging, but agents still rely on selective reading/retrieval; most repos won’t fit end-to-end
- Agentic loops matured — Write → test → fix → repeat, without human intervention
- Terminal-first tools emerged — Claude Code, Codex CLI, Gemini CLI moved beyond IDE plugins
Senior engineers benefit most. They know how to review AI output, catch subtle bugs, and steer toward good architecture. (Thorsten Ball)
Even longtime skeptics converted: Kent Beck, DHH, Thorsten Ball all shifted from skepticism to advocacy once models improved. (Pragmatic Engineer)
What's gaining value: System design, product-mindedness, knowing when to trust/reject AI. What's declining: Prototyping-from-scratch expertise, language polyglot specialization.
Practitioner snapshot: Peter Steinberger describes his output as increasingly limited by inference time and "hard thinking", not typing speed—especially for the large class of software that is "move data around and present it." A recurring implication for CLI tools: starting with a CLI makes verification trivial (agents can run it and check output), which tightens the iteration loop. (See: Shipping at Inference-Speed)
January 2026: Agents go mainstream. Anthropic launched Cowork—a desktop agent research preview in the Claude macOS app (Claude Max tier), built on Claude’s agent foundations rather than a literal repackaging of Claude Code. Simon Willison called it "a general agent that looks well positioned to bring the wildly powerful capabilities of Claude Code to a wider audience." The implication: if agents work for file organization and expense reports, the CLI tools developers use are just the beginning.
Foundational Concepts
What Is Agentic Coding?
"An LLM agent is something that runs tools in a loop to achieve a goal. — Simon Willison
Traditional AI assistants suggest code. Agents are different:
- They read your codebase (files, git history, documentation)
- They write and modify files directly
- They execute commands (tests, builds, linters)
- They iterate based on results (fixing errors, retrying)
"One way to think about coding agents is that they are brute force tools for finding solutions to coding problems. If you can reduce your problem to a clear goal and a set of tools that can iterate towards that goal, a coding agent can often brute force its way to an effective solution. — Simon Willison
The Terminal Renaissance
"Agentic coding is console based, 1970s-style, in a text terminal window with little to no UI... The developer workflow is simple: You ask the AI for code changes, then you review the diffs and test the output, in a loop, lather, rinse, repeat. — Steve Yegge
Why terminal over IDE? The agent needs to execute commands, not just suggest code. A terminal is the natural environment for this.
Key Terminology
| Term | Definition |
|---|---|
| Agentic loop | Tool use → Result → Reasoning → Next tool use |
| Context window | How much text the model can "see" (up to ~200k tokens; practical coverage depends on retrieval) |
| MCP | Model Context Protocol — standard for integrating external tools |
| Subagent | An agent spawned by another agent for subtasks |
| YOLO mode | Running with permissions disabled (risky but fast) |
Context Is the Scarce Resource
Practitioner reality: the main constraint isn’t “intelligence”, it’s context management.
Claude Code’s official best-practices guide is unusually explicit about this: the context window includes conversation + files read + command outputs, and performance can degrade as it fills.
Practical implications:
- Prefer short sessions per task over one mega-session.
- Keep investigations scoped; move heavy reading into subagents.
- Treat long command output as toxic waste: capture the key lines, then clear/compact.
Source: Claude Code docs, Best Practices. Last verified: 2026-01-23.
The Right Mental Model
"The LLM is an assistant, not an autonomously reliable coder. — Addy Osmani
Treat AI-generated code like a junior developer's contribution. It needs code review before merging, testing to verify it works, architectural guidance to fit your patterns, and supervision to catch mistakes.
You remain the senior dev. AI amplifies your expertise—it doesn't replace your judgment. This framing helps set expectations: fast output, but verification required.
In practice: Thorsten Ball describes moving from skepticism to productive use once he stopped expecting the AI to "get it right" and started treating it as a fast, tireless pair programmer who needs guidance. The shift: from "why doesn't it understand?" to "how do I steer it effectively?"
Cross-check with a second model: Some developers use one AI to generate code, then a separate session (or different model) to review it. Fresh context catches issues the original session might miss. Kent Beck uses this for complex refactors—one model proposes, another critiques.
Common Practices
These patterns work regardless of which tool you use.
1. Create a Project Context File
Every CLI tool supports a markdown file that shapes agent behavior:
| Tool | File Name |
|---|---|
| Claude Code | CLAUDE.md |
| Codex CLI | AGENTS.md |
| Gemini CLI | GEMINI.md |
What to include:
# CLAUDE.md
## Project
Brief description: what this does, who it's for.
## Stack
- Next.js 14 (App Router)
- TypeScript (strict mode)
- Supabase (Postgres + Auth)
- Tailwind CSS
## Commands
- `npm run dev` — dev server on :3000
- `npm test` — run vitest
- `npm run lint` — eslint check
## Conventions
- Prefer small, focused commits
- Write tests for new `lib/` functions
- Use early returns over nested conditionals
What NOT to include (emerging consensus):
Listing "known issues" is debatable. Some teams do it; others find the agent references stale issues inappropriately. A better pattern: link to your issue tracker rather than duplicating state in CLAUDE.md.
Keep it concise: 100-200 lines maximum. If longer, create per-folder context files.
Claude Code specific (official): start by generating it, then edit.
- Run
/initto generate a first draft based on detected tools and repo structure. - Treat the result as a starting point; keep only lines that measurably reduce mistakes.
Source: Claude Code docs, Write an effective CLAUDE.md. Last verified: 2026-01-23.
Use @file imports instead of bloating the root file (Claude Code)
Claude Code supports pulling in additional files via @path references inside CLAUDE.md.
See @README.md for project overview and @package.json for available npm commands.
# Additional instructions
- Git workflow: @docs/git-instructions.md
This pattern is high leverage because it:
- Keeps the “always loaded” file small.
- Lets you version deeper docs without turning
CLAUDE.mdinto a junk drawer.
Source: Claude Code docs, Write an effective CLAUDE.md. Last verified: 2026-01-23.
CLAUDE.md is not the only place for durable guidance
Claude Code draws a sharp line between:
- CLAUDE.md: global, always-loaded rules (keep short)
- Skills: domain/workflow knowledge loaded on demand
- Hooks: deterministic automation (no “did it follow instructions?” ambiguity)
This separation is worth adopting even if you’re not on Claude Code, because it mirrors how real teams manage rules: a small constitution + opt-in playbooks + enforced automation.
Context bundling tools: For larger codebases, tools like gitingest or repo2txt can bundle your codebase into a single file the agent can consume. Useful when you need the agent to understand the full picture.
"The single most important file in your codebase for using Claude Code effectively is the root CLAUDE.md. This file is the agent's 'constitution.' — Anthropic Engineering
2. Plan Before You Code
"One common mistake is diving straight into code generation with a vague prompt. — Addy Osmani
When to plan first: Multi-file changes, architectural decisions, anything where you'd hesitate if a junior dev proposed "let me just start coding." If you catch yourself thinking "wait, which approach?"—that's a planning signal.
When to skip planning: Single-file bug fixes, adding a field to a form, tasks where you already know exactly what the diff should look like. The overhead of planning exceeds the benefit.
The planning prompt pattern:
"I want to add user authentication. Before writing any code:
1. List the files that need to change
2. Propose a 3-step implementation plan
3. Note any architectural decisions I need to make
4. Identify risks or things that could go wrong"
Review the plan. Adjust it. Then proceed step by step.
The “Explore → Plan → Implement → Commit” loop (Claude Code official)
Claude Code’s docs recommend an explicit four-phase loop for non-trivial changes:
- Explore (read code, answer questions, no changes)
- Plan (write a plan you approve)
- Implement (execute, verifying as you go)
- Commit (checkpoint the result)
Even if your tool doesn’t have a dedicated “Plan Mode”, you can implement this as a discipline:
- Start with: “Read X and explain how it works. Don’t change anything.”
- Then: “Propose a 3-step plan. Wait for approval.”
- Then: “Implement step 1 only; run tests; stop.”
Source: Claude Code docs, Explore first, then plan, then code. Last verified: 2026-01-23.
The spec.md approach (Addy Osmani):
Before writing code, use AI to rapidly iterate on a specification:
- Describe your idea to the AI
- Ask clarifying questions back and forth
- Have the AI compile findings into a
spec.md - Review and refine the spec
- Only then start implementation
This is "waterfall in 15 minutes" — you get the benefits of upfront planning without the overhead.
Emerging practice: Some developers have the agent write the plan to a markdown file, then start a fresh session with "/clear" and have the agent read the plan file to continue. This keeps context clean.
Break work into small, iterative chunks
"A crucial lesson I’ve learned is to avoid asking the AI for large, monolithic outputs. Instead, we break the project into iterative steps or tickets and tackle them one by one. — Addy Osmani
Agents are most reliable when they can finish a small unit of work, validate it (tests/lint/build), and then move on.
Chunking prompt pattern:
We have the plan in plan.md. Implement ONLY Step 1.
Constraints:
- Touch the minimum number of files.
- After changes: run tests + lint.
- Stop once Step 1 is complete and report:
- what changed
- commands run and results
- any follow-ups for Step 2
Practitioner rationale: big, all-at-once generation tends to produce inconsistent architecture and duplicated logic ("like multiple devs worked on it without coordinating"). Small steps make it easier to review diffs, keep context accurate, and roll back when needed.
3. Use Version Control as Your Safety Net
Git is your undo button. This is even more critical with agents because:
- Agents can make changes you didn't expect
- Large refactors happen quickly (faster than you can track mentally)
- Reverting is faster than re-prompting
Workflow:
# Before starting
git checkout -b feature/add-auth
# Let the agent work (most tools auto-commit)
# Review what happened
git log --oneline -10
git diff main
# If something went wrong
git reset --hard HEAD~3
Opinion varies on commit frequency. Some prefer auto-commits (easier to undo, clear history). Others batch commits (cleaner git log). Emerging practice: auto-commit during work, squash before merging to main.
Practitioner pattern (useful with agents): commit like save points. After each chunk that leaves the repo in a good state (tests passing), make a small commit. This is less about "perfect history" and more about cheap reversibility when the next agent step goes sideways.
Make git part of the agent’s working memory:
- Paste diffs/commit logs into the session when context is stale.
- Ask the agent to explain why each hunk exists.
- Use
git bisectwith the agent when something regresses (LLMs are unusually good at reading diffs and being patient).
Guardrail (opinion, but widely repeated): never commit code you can’t explain. If an agent produces a complex fix, require a walkthrough (or simplify) before you merge.
Isolate experiments: branches and git worktree reduce blast radius and let you run parallel sessions safely. (See Running Parallel Agents.)
4. Manage Context (Tool-Dependent)
This principle varies significantly by tool.
Claude Code benefits from aggressive context clearing:
- Use
/clearwhen switching tasks - Context accumulates and can mislead the agent
- The
/compactcommand summarizes history to save tokens
Codex CLI persists sessions locally:
- Use
codex resumeto continue previous work - Less need to clear because sessions are stored
Reality check (opinion, but common in high-velocity workflows): newer models can remain effective deep into long contexts, but tool UIs don’t always surface “the repo changed” as a first-class event. If you run long sessions, periodically force a refresh:
- ask the agent to re-open the specific files it is about to edit
- use
git diff/git statusas the source of truth - use a
/catchup-style summary after compaction or breaks
Rewind beats arguing (Claude Code)
Claude Code checkpoints every action and supports rewinding conversation and/or code state.
Use this when the agent:
- Took a wrong approach and “fixed the fix” three times.
- Made a wide diff when you wanted a narrow one.
- Pulled in a dependency or refactor you don’t want.
Commands/UX:
/rewind(or double-ESC) to restore an earlier checkpoint
This is not a replacement for git, but it’s faster than prompt wrestling.
Source: Claude Code docs, Rewind with checkpoints. Last verified: 2026-01-23.
Persisting sessions: treat them like branches
If your tool supports resuming, use it deliberately:
- Name sessions by workstream (feature/refactor/incident)
- Resume only when you’re continuing the same task
- Start fresh for review (“writer/reviewer” pattern)
Claude Code examples:
claude --continue
claude --resume
Source: Claude Code docs, Resume conversations. Last verified: 2026-01-23.
Tell the tool what to preserve during compaction (Claude Code)
Compaction is often helpful, but it can delete the exact things you needed (modified file list, test commands, current failing output). Claude Code supports instructing compaction behavior via CLAUDE.md.
Practical pattern to add (edit to fit your repo):
## Context management
When compacting, preserve:
- the list of modified files
- the commands we ran + pass/fail
- the current "next step" plan
Source: Claude Code docs, Manage context aggressively. Last verified: 2026-01-23.
The "catchup" pattern (useful across tools):
After clearing or starting fresh, have the agent read recent changes:
"Read the last 5 commits in this branch and summarize what changed.
Then continue with [next task]."
Or create a custom /catchup command that does this automatically.
5. Provide Feedback Loops (Tests Are Critical)
Agents work best when they can verify their own work. Tests provide this:
"Add a calculateDiscount function.
Write a failing test first, then implement the function.
Run the test. If it fails, fix and retry."
"Test driven development (TDD) is a 'superpower' when working with AI agents. — Kent Beck
More on this in the Testing with AI Agents section.
Verification isn’t just tests
Agents improve dramatically when they can check work against any deterministic signal:
- tests (unit/integration/e2e)
- typecheck/build
- expected output from a CLI command
- screenshots / visual diffs (especially UI)
The key is to give the agent a way to know it’s done without relying on your vibes.
Source: Claude Code docs, Give Claude a way to verify its work. Last verified: 2026-01-23.
6. Update Docs and Run Improvement Loops
Often overlooked: As you work with agents, you learn what prompts work, what conventions help, what the agent struggles with. Encode these learnings.
Practices to adopt:
- After completing a feature, update your context file (
CLAUDE.md/AGENTS.md) with any new conventions you established - Create slash commands for patterns you repeat
- Periodically review: "What did the agent do well? What did it struggle with? How can I help it next time?"
"We realized we needed to teach our agent a little more about our development philosophy and steer it away from bad behaviors. The agent now understands our values around Test Driven Development and minimal changes. — Martin Fowler's team
7. Engineer for “Inference-Speed” Iteration (Practitioner Patterns)
This section is a practitioner synthesis (not universal law), largely reflecting workflows described in Peter Steinberger’s Shipping at Inference-Speed.
Start with a CLI to close the loop
Claim (practitioner): “Whatever you build, start with the model and a CLI first.”
Why this often works:
- Fast verification: an agent can run a CLI, parse stdout/stderr, and iterate without you “being the UI.”
- Lower surface area: fewer moving parts than a UI during early exploration.
- Natural automation seam: today’s agents are already excellent at shell loops.
Guide connection: this reinforces Provide Feedback Loops and complements Running Parallel Agents because CLI-driven verification is easy to parallelize.
Optimize repo ergonomics for agents (not just humans)
Opinion (practitioner): design folder structure and docs so the model can navigate “obvious” shapes.
Concrete patterns:
- Keep a
docs/folder per project for subsystem notes, invariants, and how-to-run commands. - When finishing a chunk, ask the agent to write/update a doc (e.g. “write docs to
docs/<topic>.md”). - Prefer conventions that are easy for tools to infer: predictable filenames, consistent command names, repeatable scripts.
This is compatible with the guide’s emphasis on CLAUDE.md / AGENTS.md, but pushes it further: documentation is durable memory; chat history is not.
Shorter prompts, more grounding
Observation (practitioner): as models improve, prompts often get shorter—especially when you can ground the agent with:
- the exact command to run
- the failing output
- a screenshot or snippet (“fix padding” / “this output is wrong”)
Practical takeaway: spend effort on inputs that constrain the space (tests, repro commands, fixtures, screenshots), not elaborate prose.
Model behavior differs: “slow readers” vs “eager writers”
Opinion (practitioner): some models/tools spend a long time silently reading files before writing; others produce edits quickly but may miss context.
How to use this:
- For large refactors: tolerate slower startup if it reduces “fix-the-fix” iterations.
- For small edits: faster, more eager models can win end-to-end.
Guide connection: this is an applied version of Who Plans: Human or Agent? and Single Agent vs. Multiple Agents.
Dependency and ecosystem choice matters more than ever
Claim (practitioner): the biggest high-leverage decisions shift toward language/ecosystem and dependencies.
Rationale:
- agents are better when the ecosystem is popular (more examples in training data)
- fewer, well-understood dependencies reduce failure modes
This aligns with the guide’s Dependencies review rubric and Dependency Safety.
Tool Landscape
The Big Three CLI Agents
| Aspect | Claude Code | Codex CLI | Gemini CLI |
|---|---|---|---|
| Maker | Anthropic | OpenAI | |
| Open Source | No | Yes (Rust) | Yes |
| Context | 200k tokens | Varies by model | Very large |
| Strength | Deep reasoning, complex refactors | Fast iteration, cost-effective | Large context, strong ecosystem |
| MCP Support | Native | stdio-based | Native |
Note on pricing/limits: Token allowances vary by plan and change frequently. Check official pricing pages before deciding:
Last verified: 2026-01-08
Features That Are Now Universal
All major CLI agents have converged on these capabilities:
Image input: Paste screenshots, error dialogs, or UI mockups directly into the conversation. Boris Cherny describes using the Claude Chrome extension to screenshot a broken UI, paste it, and say "fix this"—the agent sees the visual problem and fixes the code. Useful for CSS bugs, design implementation, and debugging visual regressions.
Multiple instances: Run 3-5+ agents in separate terminals on independent tasks. Simon Willison runs parallel agents for tasks like "write tests for module A" and "add feature B" simultaneously. The key constraint: they must touch different files (see Running Parallel Agents).
MCP integration: Connect agents to external tools—Linear for issues, Slack for context, Sentry for error reports. Teams check .mcp.json into git so everyone shares the same tool wiring.
Session persistence: Resume where you left off. Codex has explicit codex resume; Claude Code compacts context automatically. Useful when you hit rate limits or need to step away.
Custom commands: Define shortcuts like /pr (lint, test, prepare description) or /catchup (read recent commits, summarize state). Store them in .claude/commands/ or equivalent and commit them—they're team infrastructure. For Claude, include a description frontmatter so commands are invocable by tools:
---
description: "Run tests and summarize failures"
---
# /test
Claude Code Specifics
Boris Cherny's Workflow (Creator of Claude Code)
From Boris Cherny’s thread on how he uses Claude Code day-to-day (sources: x.com, Thread Reader). This is a practitioner snapshot, not a universal prescription (Boris explicitly emphasizes that the tool is meant to support many styles).
Parallel execution (terminal + web + phone): (practitioner report)
- Runs ~5 Claude sessions locally in terminal tabs (numbered), using system notifications to know when one needs input.
- Runs ~5–10 additional sessions on claude.ai/code in parallel.
- Hands sessions back and forth between local and web (e.g., using Claude Code’s handoff/teleport capabilities) and sometimes starts sessions from his phone, then checks in later.
Model choice (opinion, with rationale): (practitioner opinion)
"“I use Opus 4.5 with thinking for everything… even though it’s bigger & slower… since you have to steer it less and it’s better at tool use, it is almost always faster… in the end.”
This is a useful counterpoint to “always pick the cheapest/fastest model”: for some workflows, less steering + better tool use wins on end-to-end time.
Plan mode → execution mode: (practitioner workflow)
"“Most sessions start in Plan mode (shift+tab twice)… go back and forth… until I like its plan. From there, I switch into auto-accept edits mode and Claude can usually 1-shot it.”
This aligns with the guide’s Plan Before You Code principle: you’re buying correctness and fewer iterations.
Compounding team memory via CLAUDE.md: (practitioner workflow)
- Their team shares a single, versioned
CLAUDE.mdfor the repo. - “Anytime we see Claude do something incorrectly we add it to the
CLAUDE.md, so Claude knows not to do it next time.” - In code review, they’ll tag @.claude on PRs to propose additions/edits to
CLAUDE.md(via the Claude Code GitHub action:/install-github-action).
Guide takeaway: treat CLAUDE.md as institutional memory and update it from real failures—similar to how teams evolve lint rules or review checklists.
Slash commands for the inner loop: (practitioner workflow)
- Boris uses slash commands for repeated workflows and checks them into git under
.claude/commands/. - Example: a
/commit-push-prcommand used many times daily. - Notably, the command uses inline bash to precompute git status and other context, reducing back-and-forth with the model.
This is a strong pattern: push computation into tools (shell, scripts) and keep the model focused on decisions.
Subagents for repeatable “PR-shaped” work: (practitioner workflow)
- Uses subagents like
code-simplifier(cleanup after changes) andverify-app(end-to-end verification instructions).
Hooks to automate the last 10%: (practitioner workflow)
- Uses a PostToolUse hook to format generated code (“the last 10%”) to avoid CI formatting failures.
Permissions: avoid YOLO by default, allowlist instead: (practitioner workflow)
"“I don’t use
--dangerously-skip-permissions. Instead, I use/permissionsto pre-allow common bash commands…”
- Many of these permissions are shared via
.claude/settings.json.
Tool use beyond the repo (MCP): (practitioner workflow)
- Claude Code uses their tools: posts/searches Slack (via MCP), runs BigQuery queries (via
bq), grabs Sentry logs, etc. - Their Slack MCP configuration is checked into
.mcp.jsonand shared.
Long-running tasks: unblock progress, then verify: (practitioner workflow)
- For tasks that take a long time, Boris either:
- prompts Claude to verify with a background agent when done,
- uses an agent Stop hook to make verification deterministic, or
- uses the “ralph-wiggum” plugin (attributed to Geoffrey Huntley).
- In a sandbox, he may use
--permission-mode=dontAskor even--dangerously-skip-permissionsto avoid prompt deadlocks.
Most important tip: build a verification loop: (practitioner claim)
"“Give Claude a way to verify its work… [it] will 2–3x the quality of the final result.”
For web work, he describes using the Claude Chrome extension to open a browser, test the UI, and iterate. For other domains, verification might be tests, a CLI command, or a simulator. This connects directly to Testing with AI Agents.
Key Commands
| Command | Purpose |
|---|---|
/clear | Reset context |
/compact | Summarize to save tokens |
/cost | Show token usage |
/permissions | Pre-allow safe bash commands |
/doctor | Diagnose environment |
Custom slash commands: Create .claude/commands/yourcommand.md with a description frontmatter so Claude can invoke it programmatically. Boris uses /commit-push-pr dozens of times daily.
Subagents: Boris uses code-simplifier (cleans up after Claude) and verify-app (end-to-end testing) regularly.
Codex CLI Specifics
Interactive vs. Exec mode:
# Interactive (full TUI)
codex
# Non-interactive (for automation)
codex exec "Add error handling to api/users.ts"
# Resume previous session
codex resume
Parallel tool calling: Codex benefits from batching file reads:
"Before any tool call, decide ALL files/resources you will need. Batch everything. — OpenAI Codex Guide
Gemini CLI Specifics
Evidence level: Official docs are solid; public practitioner writeups are thinner than Claude/Codex. Treat this as reliable basics plus known sharp edges.
Official starting points:
- Repo, releases, issues: https://github.com/google-gemini/gemini-cli
- Configuration: https://github.com/google-gemini/gemini-cli/blob/main/docs/cli/configuration.md
- Troubleshooting: https://github.com/google-gemini/gemini-cli/blob/main/docs/troubleshooting.md
Day-to-day usage that translates well across teams:
- Keep project context in
GEMINI.mdand keep it short. - Put repeatable prompts into versioned slash commands.
- Use a plan-first, step-by-step loop with verification after each step.
Custom slash commands:
- Docs: https://cloud.google.com/blog/topics/developers-practitioners/gemini-cli-custom-slash-commands
- Store them under
.gemini/commands/and commit them.
Known sharp edges from issues:
/restorefailures reported in some environments: https://github.com/google-gemini/gemini-cli/issues/1761/restorepath errors on Windows: https://github.com/google-gemini/gemini-cli/issues/8162- Save/resume rough edges: https://github.com/google-gemini/gemini-cli/issues/5307
Guidance: treat checkpoint/restore as a convenience, not your only safety net. Git branches or worktrees plus time-to-green checks remain the backbone.
Advanced Patterns
Custom Command Libraries
Build reusable commands for your workflow:
.claude/commands/
├── catchup.md # Read recent changes, summarize state
├── pr.md # Lint, test, prepare PR description
├── test.md # Run tests, fix failures iteratively
└── improve.md # Review code for improvements
Example catchup.md:
Read all files changed in this branch compared to main.
Summarize what has changed and the current state of this work.
MCP Integration
Connect agents to external tools via Model Context Protocol:
Common integrations:
- Linear — Read issues, create tasks
- Slack — Post updates, read channel context
- Sentry — Access error reports
- Figma — Read design specs
Example setup (Claude Code):
// .mcp.json
{
"servers": {
"linear": { "command": "npx", "args": ["@anthropic/mcp-linear"] }
}
}
Note: You may see MCP config documented as .mcp.json (repo root) or under a tool-specific directory (e.g. .claude/…) depending on tool/version. Prefer the official docs for your installed version; the key practice is checking the configuration into git so the team shares the same tool wiring.
Hooks: Formatting and Verification Automation
Hooks let you automate “always do X after Y” without re-prompting.
Two common hook shapes (practitioner-inspired):
- PostToolUse formatting: after Claude edits files, run formatter(s) so CI doesn’t fail on style. Boris describes this as catching “the last 10%.”
- Stop-hook verification: when an agent finishes a long task, automatically run a verification step (tests, e2e script, smoke check) so work doesn’t stall waiting for you.
Treat hooks like code: keep them deterministic, fast, and safe. Pair with the guide’s permissions progression and Testing with AI Agents.
Testing with AI Agents
This deserves special attention. Testing is emerging as the critical enabler for effective agentic coding.
Why Tests Matter More Now
"Those who get the most out of coding agents tend to be those with strong testing practices. An agent like Claude can 'fly' through a project with a good test suite as safety net. — Addy Osmani
Tests provide:
- Clear success criteria — The agent knows when it's done
- Autonomous iteration — Write → test → fix → test, without human intervention
- Safety net — Catch regressions from AI-generated code
- Confidence for larger changes — You can let the agent refactor knowing tests will catch breaks
TDD with Agents
"By writing a test before you write any code, you are essentially 'prompting' the AI code generator with exactly the functionality you want. — Engineering Harmony
The pattern:
"Add a calculateDiscount function that:
- Takes a price and discount percentage
- Returns the discounted price
- Throws if percentage > 100
Write a test file first with cases for each requirement.
Run the test (it should fail).
Implement the function.
Run the test again. Fix until it passes."
The agent enters a tight feedback loop, iterating until tests pass.
Building Test Coverage with Agents
If you have a codebase with low coverage, agents can help:
"Look at lib/utils.ts. For each exported function that lacks tests,
write a test file. Start with the simplest functions. Run tests after each."
Emerging practice: Define a coverage threshold in CI. New PRs can't decrease coverage. This prevents regression and encourages incremental improvement.
Caveats
"AI code assistants can generate plausible-looking test cases and code, but you don't know where it learned the semantics from. What AI offers may be incorrect, inefficient, or overly complex. — testRigor
Review AI-generated tests carefully. They can:
- Test the wrong thing (testing implementation, not behavior)
- Have false positives (tests that pass but don't verify what you think)
- Mirror bugs in the implementation
Bleeding Edge: Orchestration & Memory
These tools are experimental but point to where things are heading.
Beads: Memory for Coding Agents
Problem: Agents wake up with no memory of previous sessions. You re-explain context every time.
Solution: Beads by Steve Yegge provides persistent, git-backed memory.
"The experiment that triggered Beads was simple: move the plan into an issue tracker and give agents a way to query 'ready work.' Within minutes, the behavior shifted from meandering to disciplined: compute the ready set, pick a task, work it, record discovered work, repeat. — Steve Yegge
How it works:
- Issues stored as JSONL in
.beads/ - Git-backed (versioned, branched, merged like code)
- Agents query for "ready work" rather than relying on you to specify
Gas Town: Multi-Agent Orchestration
Problem: Managing 10+ parallel agents manually is chaotic.
Solution: Gas Town by Steve Yegge orchestrates multiple agents.
"Gas Town is a Go-based orchestrator enabling developers to manage 20-30 parallel AI coding agents productively using tmux.
Features:
- Manages agent roles (Mayor, Crew, Witness, etc.)
- Handles merge queue and work swarming
- Built on Beads for persistent state
Caveat: Yegge himself says "You need to be at least level 6 [of his 8-stage model] before you'll appreciate Gas Town." This is advanced tooling.
Agent Mail (MCP-Agent-Mail)
Problem: When multiple agents work on the same codebase, they can conflict.
Solution: Agent Mail provides message routing and file reservations for coordinating agents.
Features:
- Agents can send messages to each other
- File reservation system prevents edit conflicts
- Git-backed communication archive
Recipes: Copy-Paste Workflows
Practical prompts and command sequences for common tasks.
Recipe 1: New Feature with TDD
Setup:
git checkout -b feature/your-feature-name
Prompt:
I want to add [FEATURE DESCRIPTION].
Before writing implementation:
1. Create a test file at [path/to/tests/feature.test.ts]
2. Write failing tests that specify the behavior:
- [Test case 1]
- [Test case 2]
- [Edge case]
3. Run the tests to confirm they fail
4. Implement the minimum code to pass
5. Run tests again, fix until green
6. Refactor if needed, keeping tests green
Recipe 2: Safe Refactoring
Setup:
git checkout -b refactor/component-name
git stash # save any uncommitted work
Prompt:
I need to refactor [FILE/COMPONENT].
Before making changes:
1. Read the existing code and understand its behavior
2. Identify all callers/consumers of this code
3. Write characterization tests if none exist (tests that capture current behavior)
4. Run tests to establish baseline
Then refactor in small steps:
- Make ONE logical change
- Commit with descriptive message
- Run tests
- If green, continue. If red, fix or revert.
After each commit, I'll review before you continue.
Recipe 3: Bug Fix with Regression Test
Prompt:
Bug: [DESCRIPTION OF BUG]
Expected: [WHAT SHOULD HAPPEN]
Actual: [WHAT HAPPENS NOW]
1. First, write a failing test that reproduces this bug
2. Run it to confirm it fails as expected
3. Find the root cause (don't guess—trace the code)
4. Fix the minimal code needed
5. Run test to confirm it passes
6. Run full test suite to check for regressions
Recipe 4: Code Review Prep
Prompt:
Prepare this branch for review:
1. Run linter and fix issues
2. Run tests and ensure all pass
3. Check for:
- Console.logs or debug statements to remove
- Commented-out code to delete
- TODOs that should be addressed or ticketed
4. Generate a PR description summarizing:
- What changed and why
- How to test
- Any migration/deployment notes
Recipe 5: Catch-Up After Context Clear
Prompt:
/catchup
Read the last 5 commits on this branch.
Summarize:
- What has been implemented
- What's currently broken or incomplete
- What the next logical step is
Then wait for my instruction.
Checklists
Make ownership explicit: some checks are human decisions, others are agent verification.
Human checklist (decisions)
- I can explain the diff without guesswork.
- Scope is tight: one job, one PR.
- Risk is understood: auth, data, migrations, dependencies.
- Rollback is easy or planned.
- The agent’s evidence is real: commands + outputs, not claims.
Agent checklist (verification + receipts)
Default mode is verification-only. If it fails, run a separate fix step.
Prompt pattern:
Run this checklist and report PASS/FAIL for each item.
Rules:
- Include the exact command you ran and the key output snippet.
- If you didn’t run a command, mark FAIL and say what you need.
- Do not change files (verification only).
Checklist:
1) git status (show it)
2) tests (run: <your test command>)
3) lint/format (run: <your lint command>)
4) build/typecheck (run: <your build command>)
5) secrets scan: confirm no .env / keys touched (show grep/ripgrep result)
6) PR size sanity: list changed files + total LOC changed
7) summary: what/why/how to test + risks + rollback note
Operational tip: turn this into a saved command so it runs the same way every time.
Review Rubric: What to Check in AI-Generated Diffs
When reviewing agent output, check these categories:
Correctness
- Does the code do what was asked?
- Are edge cases handled?
- Is the logic sound (not just syntactically valid)?
Security
- No secrets hardcoded
- Input validation on user data
- SQL queries parameterized
- No
eval(),dangerouslySetInnerHTML, or equivalent - Auth/authz checks in place
Performance
- No N+1 queries introduced
- Large loops have appropriate limits
- Expensive operations not in hot paths
- Memory: no obvious leaks (unclosed handles, growing arrays)
Dependencies
- New packages justified and trustworthy
- Versions pinned (not
latest) - No unnecessary dependencies added
- License compatibility checked
Style & Maintainability
- Follows project conventions (check
CLAUDE.md/AGENTS.md) - Names are clear and consistent
- Complex logic has comments
- No dead code or commented-out blocks
Migrations & Data
- Migrations are reversible where possible
- Data changes are idempotent
- Backfill scripts handle large datasets safely
Safety Patterns for Real Repos
Secrets and Environment
Rule: Never let the agent read or print secrets.
Add to your CLAUDE.md / AGENTS.md:
## Security
- NEVER read, print, or include contents of .env files
- NEVER commit files containing API keys, tokens, or passwords
- If you need an env var value, ask me to provide it
Patterns:
- Use
.env.examplewith placeholder values - Reference env vars by name, not value:
process.env.DATABASE_URL - If agent suggests hardcoding a secret, reject the diff
Dependency Safety
Before accepting new dependencies:
Before adding [PACKAGE], tell me:
1. Weekly downloads on npm/pypi
2. Last publish date
3. Number of open security advisories
4. What it does that we can't do with existing deps
Lock versions explicitly:
// Good
"lodash": "4.17.21"
// Bad
"lodash": "^4.17.21"
"lodash": "latest"
Dangerous Operations
Add guardrails to CLAUDE.md / AGENTS.md:
## Forbidden Operations
- Never run `rm -rf` on directories outside the project
- Never force-push to main/master
- Never run database migrations in production without explicit approval
- Never modify .git directory directly
Running Parallel Agents
"I began to embrace the parallel coding agent lifestyle. — Simon Willison
Running multiple agents simultaneously can multiply your throughput — but only if you prevent conflicts. Follow these rules.
When Parallel Works
- Tasks are independent (different files/modules)
- Well-defined scope upfront
- You're comfortable context-switching between reviews
When to Avoid
- Tightly coupled changes
- Exploratory work where scope is unclear
- You need deep focus on one complex problem
Rule 1: One Worktree Per Agent
# Main repo stays clean
/project/main # Human review and integration
# Each agent gets its own worktree
/project/agent-auth # Agent working on auth feature
/project/agent-api # Agent working on API changes
/project/agent-tests # Agent expanding test coverage
Setup:
git worktree add -b feature/auth ../project-agent-auth
git worktree add -b feature/api ../project-agent-api
Agent-managed approach: You can instruct the agent to set up its own worktree:
"Create a git worktree for this feature at ../project-auth.
Work in that directory. When done, let me know so I can review."
Rule 2: Non-Overlapping File Ownership
Before starting parallel agents, define boundaries:
| Agent | Owns | Hands Off |
|---|---|---|
| Auth Agent | src/auth/*, src/middleware/auth.ts | Everything else |
| API Agent | src/api/*, src/routes/* | src/auth/* |
| Test Agent | tests/* | Source files (read-only) |
Tell each agent explicitly:
You are working on [SCOPE].
Do NOT modify files outside: [FILE PATTERNS]
If you need changes elsewhere, stop and tell me.
Rule 3: Integration Branch
Never merge agents directly to main. Use an integration branch:
main
└── integration
├── feature/auth (Agent 1)
├── feature/api (Agent 2)
└── feature/tests (Agent 3)
Merge order:
- Merge lowest-risk changes first (tests, docs)
- Merge foundational changes before dependent ones
- Run full test suite after each merge
- Only merge integration → main after all conflicts resolved
Rule 4: Coordination Protocol
For teams using Agent Mail or similar:
- Agents reserve files before editing
- Release reservations after committing
- Send status messages when hitting blockers
- Check inbox before starting new work
Troubleshooting
Agent Is Stuck in a Loop
Symptoms: Agent keeps trying same fix, not making progress
Real example: An agent trying to fix a TypeScript error keeps adding type annotations, but the real issue is a missing dependency import. Each iteration adds more complexity without solving the root cause.
Solutions (try in order):
- Reduce scope: "Stop. Focus only on [specific small thing]"
- Reset context:
/clearthen re-explain with fresh context - Switch models: Try a different model for a fresh perspective (Opus → Sonnet, or vice versa)
- Add tests: "Write a test that fails, then fix just that"
- Provide hints: Give the agent a specific approach to try
- Bisect: "The code worked at commit X. Find what broke it."
Practitioner insight: Simon Willison notes that when an agent loops, it's often because the problem is underspecified. Giving a concrete failing test case or exact error output often breaks the loop immediately.
Agent Misunderstands the Codebase
Symptoms: Agent makes changes that don't fit project patterns
Solutions:
- Update your context file (
CLAUDE.md/AGENTS.md) with the patterns it's missing - Point to specific example files: "Look at how we do this in src/auth/login.ts"
- Explain the "why" not just the "what"
Agent Output Is Too Verbose/Complex
Symptoms: Over-engineered solutions, unnecessary abstractions
Real example: Asked to "add email validation," the agent creates an abstract Validator base class, a ValidationResult type, a ValidationRegistry, and finally an EmailValidator extending the base—when const isValidEmail = (s) => /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(s) would suffice.
Solutions:
- Be explicit: "Use the simplest possible implementation. No new classes or abstractions."
- Set constraints: "This should be under 50 lines"
- Reject and retry: "This is too complex. Simpler approach: just a regex helper function"
- Reference existing patterns: "Look at how we did
isValidUrlin utils.ts and follow that style"
Agent Keeps Suggesting Same Wrong Fix
Symptoms: Suggesting approach you've already rejected
Solutions:
/clearand start fresh- Explicitly state what NOT to do
- Provide the correct approach directly
Tests Pass But Code Is Wrong
Symptoms: Tests green, but manual testing reveals bugs
Solutions:
- Tests are testing wrong thing—rewrite with clear assertions
- Add integration/e2e tests, not just unit tests
- Review test coverage: are edge cases covered?
Staying Honest About Productivity
AI makes code cheaper. The bottleneck shifts to verification, review, and fixing mistakes.
Questions to ask yourself periodically:
- Am I actually shipping faster, or just generating more code to review?
- Are my PRs getting harder to review (bigger, more AI-generated churn)?
- How often do I revert AI-generated changes?
- Is review becoming a bottleneck for the team?
The trap: AI can flood the pipe with PRs that take longer to review than they saved to write.
""I can only focus on reviewing and landing one significant change at a time." — Simon Willison
If you're a team lead tracking metrics formally: time-to-green (prompt → tests pass), rework rate (fixup commits), and rollback rate tell you whether AI is actually helping.
Where Practices Diverge
Not everyone agrees. Here are the key debates.
Permission Skipping
| Approach | Argument |
|---|---|
Skip permissions (--dangerously-skip-permissions) | "Unlocks huge productivity" — no confirmation dialogs for every action |
| Keep permissions | Confirmation prompts catch mistakes before they happen |
| Sandbox (Docker/VM) | Run unrestricted but limit blast radius |
Recommended progression:
Level 1: Safe Mode (default)
├── All permissions prompts enabled
├── Good for: learning the tool, unfamiliar codebases
└── Cost: slower, but maximum safety
Level 2: Allowlist Common Commands
├── Use `/permissions` to pre-allow safe patterns
│ - `npm test`, `npm run lint`
│ - `git add`, `git commit` (not push)
│ - File reads (not writes initially)
├── Good for: familiar projects, trusted workflows
└── Claude Code: /permissions add "npm test"
Level 3: Sandbox for YOLO
├── Run in Docker container or disposable VM
├── Network restricted, no access to credentials
├── Skip all permissions inside the sandbox
├── Good for: experimental work, high-velocity prototyping
└── Cost: setup overhead, can't test integrations
Level 4: Full YOLO (use sparingly)
├── --dangerously-skip-permissions
├── Only for: throwaway code, personal experiments
├── Never for: production access, client work
└── Always on a fresh branch with ability to revert
"Running [agents] with permission checks disabled is dangerous and stupid, and you should only do it if you are willing to take dangerous and stupid risks. — Steve Yegge (despite using it himself)
Mainline-by-Default vs. Branch/Worktree Safety
There’s a sharp divergence between solo, high-velocity workflows and team/production workflows.
| Approach | Why people do it | When it breaks down |
|---|---|---|
Commit directly to main (practitioner opinion; common among solo builders) | Lowest cognitive overhead; fewer merge conflicts; treats git history as a linear “walk up the mountain” | Teams, CI gates, release branches, or any environment where main must remain deployable |
| Feature branches / PRs (industry default) | Reviewability, CI policy enforcement, safer collaboration | Slightly slower inner loop; more state to manage |
git worktree per agent (agent-heavy teams) | Parallelism without clobbering; clean separation of contexts | More setup; still needs an integration strategy |
Guide stance: for anything shared or production-facing, keep the branch/worktree safety patterns in this guide. If you adopt mainline-by-default for solo work, compensate with strong verification loops (tests, formatters, smoke scripts) and “save point” commits.
Who Plans: Human or Agent?
| Approach | When it Works |
|---|---|
| Agent plans first | Well-defined tasks, agent knows the codebase |
| Human plans, agent executes | Complex architecture, novel problems |
| Iterative co-planning | Exploratory work, you're learning alongside the agent |
Emerging practice: Have the agent propose a plan, review it, adjust, then approve. This combines agent knowledge with human judgment.
Single Agent vs. Multiple Agents
| Approach | Best For |
|---|---|
| Single agent, deep focus | Complex refactoring, architectural work |
| Parallel agents | Independent features, test coverage expansion |
"I can only focus on reviewing and landing one significant change at a time, but I'm finding an increasing number of tasks that can be fired off in parallel without adding too much cognitive overhead. — Simon Willison
One Model vs. Multiple Models ("second opinion" workflows)
Not everyone agrees on whether you should stick to one model/tool or treat models as interchangeable.
| Approach | Why people choose it | Main failure mode |
|---|---|---|
| Single model for everything | Consistent style, fewer moving parts, easier team standardization | You can get stuck in a model’s blind spots; repeated wrong suggestions |
| Swap models when stuck ("model musical chairs") | Fresh perspective; different models are better at different tasks | Context transfer overhead; inconsistent conventions |
| Two-model loop (generator + reviewer) | One model writes, another critiques; catches subtle mistakes | Can create "review theater" if you don’t also run real tests |
Guide stance: any of the above can work. If you choose multi-model, keep the workflow grounded in external verification (tests, linters, builds) and your Review Rubric—not just agreement between models.
Community Pulse (January 2026)
Synthesized from practitioner discussions on X. Updated monthly.
Hybrid workflows are emerging: Rather than picking one tool, practitioners combine them:
""My vibe coding combo: Claude Code with Opus 4.5 to build a functional full stack foundation... Gemini CLI with Gemini 3.0 to polish... Codex CLI with GPT-5.2-codex xhigh to scan" — @rafaelobitten
Hybrid setups like CCG-Workflow use Claude Code as supervisor with Codex and Gemini for collaborative development. (source, zh)
Tool strengths by task:
- Claude Code: Faster for code generation, but "has more fine mistakes" (source)
- Codex CLI: "Better for thinking-based work" and more usage hours per $20 (source, source)
- Gemini CLI: "Performs better at frontend design", nearly free, similar UX to Claude Code (source, source)
The closing gap:
""Gemini CLI has caught up with Claude Code in terms of effectiveness. Lot less tool call failures." — @championswimmer
Bottom line: No clear winner—personal workflow matters more than picking "the best" tool.
Sources
Official Documentation
Thought Leaders
- Simon Willison — Parallel agents, agentic loops
- Steve Yegge — Beads, Gas Town, 8 stages of AI dev
- Boris Cherny — Creator of Claude Code (see also Thread Reader mirror)
- Kent Beck — TDD as superpower with AI
- Thorsten Ball — From skeptic to believer
- Addy Osmani — Disciplined human-AI collaboration, planning-first workflows
- Addy Osmani (Substack) — “My LLM coding workflow going into 2026” (spec-first, chunking, verification, commit-as-save-points)
- Gergely Orosz — Industry analysis
- Peter Steinberger — Practitioner workflow: CLI-first verification loops, “inference-speed” iteration
- Martin Fowler's team — Development philosophy
Emerging Tools
- Beads — Memory for coding agents
- Gas Town — Multi-agent orchestration
- Clawdbot — Experimental CLI automation bot (early-stage)
- Agent Mail — Multi-agent coordination
Contributing
We welcome input! The best way to improve this guide is to share a link to a credible article, blog post, or X thread with practical advice. Send us a suggestion and we'll review it for inclusion.
This guide is maintained by TheCrux and updated as practices evolve.
Get the daily briefing
Stay on top of the best analysis and practitioner workflows.
Start your briefing