THE NUMBER
First attempt: 95% garbage. Second: 50%. Third: finally workable. That is one staff engineer's honest scoreboard after six weeks working alongside an AI agent, and he still ships two to three times faster than before. Two throwaway tries plus one good one, still cheaper than doing the job once yourself. That trade is what this newsletter is about.
3 THINGS HAPPENING RIGHT NOW
They felt 20% faster. They were 19% slower.
METR, an independent AI research nonprofit (not a vendor), ran a real head-to-head: 16 experienced people doing the same work with AI and without. The ones using AI felt 20% faster and were actually 19% slower. The kicker: METR later flagged its own methodology and walked the finding back, so even this much-cited number is shakier than it looks. The lesson holds regardless. If you do not measure your own output, you can pay for an agent for months and never notice it cost you time.
10,000 people produced more, and the work piled up downstream
Faros AI, an engineering-analytics company tracking 10,000+ workers across 1,255 teams, found the ones using AI produced 47% more work per day. But each piece took 91% longer to check, ran 154% bigger, and carried 9% more mistakes. The agent made the front of the line faster and the back of the line longer. Speed up writing the proposal but still read every one twice, and you have not saved the afternoon. You have moved it.
A consultant's rule for which jobs to hand an agent
Sadie St Lawrence, an AI consultant at HMCI, deploys agents for clients and published a blunt breakdown. Agents that go find and summarize things work, as long as the task is tight and the sources are reachable. Agents that run a multi-step process break the moment anything unexpected happens. Agents you want to make the call for you she flatly calls "not ready"; the best one in her benchmarks got it right on the first try 24% of the time. The pattern: hand an agent the gathering, keep the deciding.
THE DEEP DIVE
Six Weeks With An Agent, By A Staff Engineer Who Counted
Vincent Quigley is a staff engineer at Sanity, a content management company that does not sell AI tools. He went all-in on an AI agent for six weeks and kept a daily scoreboard, because he wanted the real number, not the vibe.
The number is three. First attempt: 95% garbage. Second: 50%. Third: finally workable. By his own count the agent writes 80% of his first drafts and he steers the rest, like a junior who needs direction, not a teammate who finishes things. He pays $1,000 to $1,500 a month and keeps paying. The places it confidently hands him wrong work: anything fiddly, anything where a mistake is expensive.
The real math is in that three. Two throwaway tries plus one good one still cost less than a single careful pass from him, so he ships features two to three times faster and the cost per feature drops. The win is not a perfect first try. It is volume.
That arc fits anyone who iterates on what they make: proposals, marketing copy, client emails, reports. The agent does the first pass. You do the third. You are not buying a finished answer. You are buying a faster start.
ONE THING TO TRY THIS WEEK
Every honest account above came from someone who kept count. They knew what their agent was good at and what to never trust it with. Here's how to start your own scoreboard.
Open Claude Code. Pick one short task you do most days: a morning email triage, a weekly status writeup, a daily competitor scan.
Type this:
Set up an agent that does [your task] every weekday morning.
After each run, append a row to scoreboard.md with the date,
what the agent tried, and a blank space for me to mark
Y / N / partial. Schedule it yourself - I don't want to
touch cron. Tell me when it's ready.
Approve once. Claude builds the agent, creates the scoreboard file, and schedules it. From tomorrow, it runs on its own.
Each morning, take 30 seconds to mark Y / N / partial in scoreboard.md and add one line if the agent failed.
After two weeks, ask Claude to read scoreboard.md and tell you what your agent is good at, where it fails, and whether to keep it.
You will not have to guess. You will have your own data.
Stuck? Reply to this email. I'll help.
WHAT'S COMING
Next issue: the 5-minute guardrails that would have stopped the 9-second database wipe - how to give your agents real access without handing them keys to break things.
Manu
