Lessons from not reading code

There’s been a lot of discourse around coding agents, how they’re being used, and what the future holds for software development. I wanted to pen down some interim thoughts after using coding agents and building harnesses for them over the last ~6-7 months. Essentially — why not reading code can be a rational, high-leverage choice.

tl;dr - once agents got reliable at making changes, the limiting factor shifted from code production to verification and direction. If you keep treating agents like a fast pair programmer and insist on auditing every line, you never unlock the real advantage. Invest in task verifiability.

The shift

Agents are now good enough at generating and applying changes that the bottleneck is usually the human: setting the right direction, defining what “correct” means, and verifying outcomes. The core shift is moving from “write the code” to “plan the specs and design the checks”.

This is why it’s viable to not read code in detail: you can understand what is happening without reading every line if your system is built for verifiability. It requires letting go, but it’s also the only way to scale the workflow.

Here’s the core idea: focus on and invest in tools, processes, and automation around verifiability.

In other words, unit, integration, E2E tests and tools that give agents actionable feedback. These tools should not differ from the tools given to engineers for development.

I’ll enumerate practical things you can do in your team to prepare your codebase for agentic development. I’m going to skip the more obvious things like AGENTS.md/CLAUDE.md, skills, adopting spec-driven approaches, prototyping, etc.

1. Audit your codebase for tests that involve mocks

Mocks have their place, especially in systems that rely heavily on third-party dependencies (e.g. payments). But mocks are often abused because they’re easy to wire in and they keep CI green.

Start by auditing unit tests that rely on mocks. A practical rule: if a unit test needs mocks to be written, it may be better off as an integration or E2E test with actual dependencies. If that isn’t possible, the mock lives to see another day (damn!). But then you need to audit the mock for correctness vs. the real system — extremely painful to enforce and maintain (often misleading too), which is exactly why mocks should be the exception.

Historically, wiring integration/E2E tests has been a PITA because it requires infrastructure and provisioning real resources. And writing those tests was brittle and slow because you had to debug every step. With today’s models, I’d argue that setting up these tests is extremely cheap. Most effort shifts to test design and resource provisioning; the actual test writing can be done by the agent.

2. Invest in integration and E2E tests

Do not put this off. Every feature/fix/refactor should come with integration and/or E2E coverage unless you have real constraints around provisioning dependencies. As much as possible, it should be one of the things you spend time thinking about when implementing features (instead of thinking about the exact code to write).

Think about:

What needs to be tested (as close to the actual experience as possible)
What needs to exist for these tests to run

One good integration/E2E test suite is 1000x better than a million unit tests that pass because you mocked a bunch of dependencies.

Have a magic auth flow that sends a verification code to the user with Resend and WorkOS? Set up Playwright to launch AuthKit and Resend webhooks to receive the emails and regex to extract the code to drive the auth flow. Have a deployment flow for a durable execution engine task handler that needs to drain previous tasks before shutting down the old version? Run a test deployment script against a provisioned instance of the engine. Have a payment flow on Stripe that needs to apply a discount code for a given set of users that fit a specific condition? Set up Stripe sandboxes and write the damn test.

Invest in the test design and infrastructure required. This investment compounds.

Note: at the end of the day, there will be real constraints and not everything can be reliably provisioned. Do your best :)

3. Use GitHub CLI

gh is one of the most powerful CLI tools around. It gives programmatic access to virtually every piece of GitHub. In the age of coding agents, this is a godsend. With gh CLI, you can:

respond to comments on issues, PRs, etc.
open PRs
check CI statuses
retrieve logs from GitHub Actions

Do you see where we’re going with this?

“Help me open a PR, then monitor it with gh (there’s a useful --watch flag!) until status checks are done. If there are any issues, retrieve the logs and fix the issues. Repeat until CI is green.”

This allows you to come back only when the PR is deemed ready.

I can already hear your complaint coming, “But the agent can game the CI!!!” That’s why points 1 and 2 exist. Invest in verifiability.

4. Treat AGENTS.md as a self-improving instructions doc

I know I said I wouldn’t really talk about it, but it’s still worth mentioning: you should keep evolving AGENTS.md according to the first three points. Contextualize it to the workflow your team needs and adapt it as the team gets more familiar with fully agentic development cycles.

For example, define how the agent should open a PR, monitor CI, and address issues until CI passes. Here’s something we use today:

Pull Requests

GH CLI is available; you can open a PR with it when needed.

If you open a PR, you must monitor its CI and address issues until the PR is green, unless the failure requires human intervention (e.g. missing GitHub secrets). Do not hack or workaround CI failures.

Title: use conventional commit format summarizing the PR.

Description: each section uses a H2 markdown header.

The PR description must include a clear, per-test-case breakdown for any new/updated tests, describing what is being tested and how the test is implemented to validate it.
## What was changed

You should write a summary no longer than one paragraph describing what was changed. After this summary, provide details including file references (this should be a bullet list with brief explanation about what was touched and why).

## What the implication was

Based on the changes, describe what the exact implications are. Where relevant, use Mermaid diagrams to help illustrate flows or other visual concepts.

## Checks and tests performed

Self-explanatory.

These four points go a long way toward setting your team up for a self-improving codebase that agents can work with over time. My thoughts are still evolving, but I’m excited to see how teams adapt to this shift in the coming months.