The landscape of software engineer workflows has undergone significant transformations in recent years, driven by the emergence of AI coding tools such as Cursor and GitHub Copilot. These tools, powered by advanced AI models from OpenAI, Google DeepMind, Anthropic, and xAI, aim to augment productivity by automating code writing, bug fixing, and change testing, following substantial performance improvements in various software engineering tests.
A recent study published by the non-profit AI research group METR, however, challenges the notion that current AI coding tools universally enhance productivity for experienced developers, presenting findings that suggest a more nuanced impact of these tools on development workflows.
METR’s study involved a randomized controlled trial with 16 experienced open-source developers, who were tasked with completing 246 real tasks on large code repositories. The tasks were divided into two categories: “AI-allowed” tasks, where developers could use state-of-the-art AI coding tools like Cursor Pro, and tasks that prohibited the use of AI tools. This setup allowed researchers to compare the efficiency of developers with and without AI assistance.
Initially, developers predicted that using AI coding tools would reduce their task completion time by 24%. However, the actual outcome was contrary to this expectation. The study revealed that the use of AI tools increased completion time by 19%, indicating that developers worked more slowly when aided by AI, a finding that contradicts the anticipated benefits of AI coding tools.
It’s noteworthy that only about 56% of the participating developers had prior experience with Cursor, the primary AI tool used in the study. Although nearly all developers (94%) had used some form of web-based Large Language Models (LLMs) in their coding workflows, this was the first exposure to Cursor for some. To mitigate any learning curve issues, developers received training on using Cursor before the study commenced.
The findings of METR’s study raise important questions about the presumed benefits of AI coding tools in enhancing developer productivity. They suggest that the assumption of universal productivity gains from these tools may be overly optimistic, at least in the context of experienced developers working on complex codebases.
According to METR researchers, several factors could explain why AI tools slowed down developers instead of speeding them up. One significant reason is the time spent on prompting the AI and waiting for its responses, which can exceed the time spent on actual coding. Additionally, AI tools often struggle with large, complex codebases, which were used in this study, potentially limiting their effectiveness.
The authors of the study exercise caution in interpreting these results, avoiding strong conclusions about the failure of AI systems to improve productivity for most software developers. They acknowledge that other large-scale studies have demonstrated the potential of AI coding tools to enhance software engineer workflows, highlighting the complexity of this issue.
Furthermore, the study’s authors recognize the rapid progress of AI technology, suggesting that the current findings may not hold true in the near future. METR has also observed significant improvements in AI coding tools’ ability to handle complex, long-horizon tasks over the past year, indicating a trajectory of continuous improvement.
This research contributes to a growing body of evidence that prompts a skeptical view of the promised benefits of AI coding tools. Other studies have highlighted concerns such as the introduction of mistakes and potential security vulnerabilities by these tools, underscoring the need for a balanced assessment of their impact on software development workflows.
Source Link