We Are Changing Our Developer Productivity Experiment Design

🟢 READ | ⏱ 8 min | 📡 9/10 | 🎯 Researchers, engineering leaders, anyone citing AI productivity data

TL;DR

METR's follow-up to their controversial 2025 study (which found AI caused a 19% slowdown) is now itself compromised: developers refuse to participate if they can't use AI, selectively submit only AI-unfriendly tasks to the AI-disallowed condition, and some quit mid-study when assigned to no-AI tasks. The new data hints at speedup (−18% time estimate for original cohort, i.e., faster with AI), but METR explicitly calls this a lower bound and warns the methodology is broken. The self-selection IS the finding.

Signal

30–50% of developers told METR they were actively choosing NOT to submit tasks they didn't want to do without AI — systematic selection bias that invalidates the controlled comparison
One developer completed zero tasks assigned to the AI-disallowed condition — a researcher's nightmare, but a cultural reality check
Developer quotes are gold: "It's like trying to get across the city walking when all of a sudden I was more used to taking an Uber" — this is dependency, not preference
Pay reduction ($150/hr → $50/hr) added confounding selection effects on top of AI dependency

What They're NOT Telling You

METR deserves credit for publishing this honestly rather than burying the methodological failure. The broader implication — that we may never be able to run valid controlled experiments on AI productivity because the "control" condition (no AI) is now too alien — is underemphasized in the post. This isn't just a study design problem; it's a signal that AI integration has crossed an irreversibility threshold.

Trust Check

Factuality ✅ (primary source, METR's own data) | Author Authority ✅ (METR, leading AI safety/evaluation org) | Actionability ✅ (stop citing the 2025 slowdown paper as current evidence)