• Microsoft study claims AI is still struggling to debug software

    From TechnologyDaily@1337:1/100 to All on Fri Apr 11 11:45:09 2025
    Microsoft study claims AI is still struggling to debug software

    Date:
    Fri, 11 Apr 2025 10:30:00 +0000

    Description:
    AI is great for generating code, but its still underperforming when it comes to simple debugging tasks.

    FULL STORY ======================================================================AI promises a huge revolution for developers, but is it just for code creation? Popular AI models from Anthropic and OpenAI arent great at debugging Microsofts researchers are open-sourcing their tools to facilitate research

    Although generative AI is increasingly being integrated into programming workflows, new research from Microsoft reveals that large language models still arent quite up to scratch when it comes to debugging.

    The research suggests that even advanced models still struggle with debugging tasks that are pretty simple for experienced developers, highlighting the continued importance of human programmers.

    AI does appear to have a solid use case, though, with Google now claiming
    that around 25% of new code is AI-generated. Meta has also noted the wide deployment of AI for coding. AI is good for code creation, but not for debugging

    The report explores how 11 Microsoft researchers tested nine AI models on SWE-bench Lite a popular debugging benchmark. Claude 3.7 Sonnet offered the highest success rate at a far-from-perfect 48.4%. OpenAIs o1 and o3-mini posted lower success rates of 30.2% and 22.1% respectively.

    Even with debugging tools, our simple prompt-based agent rarely solves more than half of the SWE-bench Lite issues, the researchers wrote, blaming the suboptimal performance on a lack of data representing sequential decision-making behavior.

    All hope is not lost, though. We believe that training or fine-tuning LLMs
    can enhance their interactive debugging abilities, they added. The
    researchers intend to fine-tune an info-seeking model specialized in
    gathering the necessary information to resolve bugs, but in the meantime,
    they promise to open-source debug-gym to make it easier for others to conduct similar research.

    Debug-gym is described as an environment that allows code-repairing agents to access tools for active information-seeking behavior.

    However, for now, artificial intelligence might not be bringing as much value to developers lives as AI companies suggest. Most developers spend the majority of their time debugging code, the researchers wrote, indicating that even if they are benefitting from code generation, it might not be saving
    them that much time. You might also like Enhance productivity with the best
    AI tools and best AI writers GitHub Copilot launches new AI tools, but also limits on its premium models Need an upgrade? Consider asking your boss for the best laptops for programming



    ======================================================================
    Link to news story: https://www.techradar.com/pro/microsoft-study-claims-ai-is-still-struggling-to -debug-software


    --- Mystic BBS v1.12 A47 (Linux/64)
    * Origin: tqwNet Technology News (1337:1/100)