Forum: 4d2 dot org

Microsoft study claims AI is still struggling to debug software

From TechnologyDaily@1337:1/100 to All on Fri Apr 11 11:45:09 2025

Microsoft study claims AI is still struggling to debug software

Date:
Fri, 11 Apr 2025 10:30:00 +0000

Description:
AI is great for generating code, but its still underperforming when it comes to simple debugging tasks.

FULL STORY ======================================================================AI promises a huge revolution for developers, but is it just for code creation? Popular AI models from Anthropic and OpenAI arent great at debugging Microsofts researchers are open-sourcing their tools to facilitate research

Although generative AI is increasingly being integrated into programming workflows, new research from Microsoft reveals that large language models still arent quite up to scratch when it comes to debugging.

The research suggests that even advanced models still struggle with debugging tasks that are pretty simple for experienced developers, highlighting the continued importance of human programmers.

AI does appear to have a solid use case, though, with Google now claiming
that around 25% of new code is AI-generated. Meta has also noted the wide deployment of AI for coding. AI is good for code creation, but not for debugging

The report explores how 11 Microsoft researchers tested nine AI models on SWE-bench Lite a popular debugging benchmark. Claude 3.7 Sonnet offered the highest success rate at a far-from-perfect 48.4%. OpenAIs o1 and o3-mini posted lower success rates of 30.2% and 22.1% respectively.

Even with debugging tools, our simple prompt-based agent rarely solves more than half of the SWE-bench Lite issues, the researchers wrote, blaming the suboptimal performance on a lack of data representing sequential decision-making behavior.

All hope is not lost, though. We believe that training or fine-tuning LLMs
can enhance their interactive debugging abilities, they added. The
researchers intend to fine-tune an info-seeking model specialized in
gathering the necessary information to resolve bugs, but in the meantime,
they promise to open-source debug-gym to make it easier for others to conduct similar research.

Debug-gym is described as an environment that allows code-repairing agents to access tools for active information-seeking behavior.

However, for now, artificial intelligence might not be bringing as much value to developers lives as AI companies suggest. Most developers spend the majority of their time debugging code, the researchers wrote, indicating that even if they are benefitting from code generation, it might not be saving
them that much time. You might also like Enhance productivity with the best
AI tools and best AI writers GitHub Copilot launches new AI tools, but also limits on its premium models Need an upgrade? Consider asking your boss for the best laptops for programming

======================================================================
Link to news story: https://www.techradar.com/pro/microsoft-study-claims-ai-is-still-struggling-to -debug-software

--- Mystic BBS v1.12 A47 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)

Who's Online
Recent Visitors
- Zwixxel
  Sat Apr 12 12:16:18 2025
  from - via SSH
- Zwixxel
  Sat Apr 12 12:06:54 2025
  from - via SSH
- Zwixxel
  Sat Apr 12 12:04:25 2025
  from - via Telnet
- Mycroftb
  Sat Apr 12 06:22:03 2025
  from Grand Rapids, Mi via Telnet

System Info

Sysop:	Sarah
Location:	Portland, Oregon
Users:	78
Nodes:	16 (0 / 16)
Uptime:	18:37:36
Calls:	503
Calls today:	503
Files:	84,260
U/L today:	44 files (5,578M bytes)
D/L today:	3,026 files (285M bytes)
Messages:	54,970
Posted today:	29

Microsoft study claims AI is still struggling to debug software

Who's Online

Recent Visitors

System Info