The Numbers Are Damning
A 2025 METR randomized controlled trial found that experienced developers using AI tools were 19% slower than those working without them. More striking: those same developers believed they were 24% faster. Wrong on both direction and magnitude.
Only 5% of companies are creating substantial AI value at scale. The other 95% are bolting AI onto the way they already work — and wondering why nothing improves.
What the 5% Are Doing Differently
StrongDM, a three-person engineering team, ships production software without humans writing or reviewing code. Their workflow: humans write specifications, AI builds, tests, and deploys.
FutureHouse used AI to autonomously identify a drug candidate for macular degeneration — from concept to submission in 2.5 months.
The pattern is the same: humans specify, machines execute, humans evaluate. The work isn't automated out of existence — the cognitive load of execution is mechanized so humans can focus on specification and evaluation.
When This Works (and When It Doesn't)
| Works When | Fails When |
|---|---|
| Success criteria are clear and testable | Defining what to specify is itself the hard problem |
| Work draws on structured knowledge | Work depends on relationships and trust |
| Tasks decompose into sequential steps | Judgment is subjective or culturally specific |
| Fast feedback loops exist | Single mistakes cause irreversible harm |
The Cognitive Atrophy Risk
There's a legitimate danger here: if your team stops doing the work, who develops the expertise to write quality specifications? The solution is an apprenticeship model — juniors learn by writing specs alongside seniors, not by watching AI execute.
The goal isn't to remove humans from the loop. It's to move humans upstream to where their judgment is irreplaceable.
The Five-Step Framework
- Select one repetitive workflow
- Write detailed specifications as if onboarding a new hire
- Let AI execute the entire workflow end-to-end
- Evaluate outcomes against specifications, not process
- Measure honestly using actual metrics, not perception
The Gap Is Structural
The gap isn't between companies that use AI and companies that don't. It's between companies that redesigned their workflows around AI execution and companies that bolted AI onto the way they already work.
One of those groups will be 40% more productive in two years. The other group will still be running the same pilot they started this year.