AI and Developer Productivity: A Reality Check

Despite widespread belief among software engineers that artificial intelligence (AI) can significantly accelerate their work, studies say that the perceived productivity gains may be exaggerated. Researchers from the Model Evaluation & Threat Research (METR) organization conducted an experiment involving contributors to large, mature GitHub projects. The study assigned 246 issues randomly tagged as either “AI allowed” or “AI disallowed” to 16 developers.

When AI use was permitted, participants primarily relied on the Cursor Pro IDE and Anthropic’s Claude 3.5/3.7 models. Over a period of roughly two months between February and June 2025, the developers logged their time and screen activity. Before starting each task, they estimated how much time AI would help them save. On average, they believed AI would allow them to complete tasks 24% faster. However, the results showed the opposite: tasks took 19% longer with AI assistance than without.

“Our primary motivation was to develop methods to understand whether AI actually speeds up software developers,” explained Nate Rush, a co-author of the study from METR. “We expected an obvious speed-up—maybe 20%, 50%, or even double the speed. But that’s not what we found.”

Polarized Reactions

The study’s findings have sparked predictable reactions from both AI enthusiasts and skeptics. For AI boosters, the results seem implausible and possibly flawed. For skeptics, the study is seen as evidence that AI’s productivity benefits have been overhyped.

Steve Newman, co-founder of Writely (which later became Google Docs), initially thought the results were “too bad to be true.” However, after reviewing the study’s methodology, he concluded that the findings were legitimate. “The response to the paper shows how hungry people are for solid information about AI’s real-world impact,” Rush noted.

Simon Willison, co-creator of the Django web framework and a regular user of AI coding tools, described the research as “very credible,” despite its small sample size. “The absolute truth in it is how bad people are at estimating their own productivity,” he said.

Milan Milanović, CTO at 3MD with over 20 years of industry experience, added: “The AI productivity myth just got some real data behind it.” He pointed out that experienced developers often know their codebases better than AI can, especially in complex, million-line repositories. “In that environment, AI can become a liability rather than an asset,” he observed.

Slop bothering developers

Recently, cURL, a widely used command-line tool, has been overwhelmed by a flood of AI-generated bug reports—many of which are inaccurate or nonsensical. The influx became so unmanageable that the project ended its bug bounty program in early 2026. Project founder Daniel Stenberg described the situation as a “DDoS” on open source maintainers, with AI “slop” reports harming productivity and morale.

In response, cURL introduced a screening checkbox to filter low-effort submissions, but the damage had already been done. While some AI tools have found real vulnerabilities, the current wave of reports is mostly noise. The episode highlights the need for better validation and more trustworthy signals in open source bug reporting.

Source: Arxiv, Curl