Cognition AI's Devin, touted as the first AI software engineer, struggles to meet its claims in a recent evaluation. Despite boasting capabilities like code generation, bug fixing, and even ordering lunch, Devin often produces unusable solutions, gets stuck in technical dead-ends, or pursues impossible tasks. While the AI offers a polished user experience when it functions, its autonomous nature sometimes becomes a liability, leading to wasted time and effort.
A service described as"the first AI software engineer" appears to be rather bad at its job, based on a recent evaluation.in March 2024. The bot’s creator, an outfit called Cognition AI, has made claims such as “ Devin can build and deploy apps end to end," and"can autonomously find and fix bugs in codebases.
The service uses Slack as its main interface for commands, which are sent to its computing environment, a Docker container that hosts a terminal, browser, code editor, and planner. The AI agent supports API integration with external services. This allows it, for example, to send email messages on a user's behalf via SendGrid.," meaning it relies on multiple underlying AI models, a set that has included OpenAI's GPT-4o and can be expected to evolve over time.
The researchers said that Devin provided a polished user experience that was impressive when it worked."More concerning was our inability to predict which tasks would succeed. Even tasks similar to our early wins would fail in complex, time-consuming ways. The autonomous nature that seemed promising became a liability – Devin would spend days pursuing impossible solutions rather than recognizing fundamental blockers.
AI Software Engineering Devin Cognition AI Openai
United Kingdom Latest News, United Kingdom Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
AI Software Engineer Devin Fails to Impress in Recent EvaluationCognition AI's Devin, touted as the first AI software engineer, has fallen short of expectations in a recent evaluation. Despite claims of end-to-end app development and autonomous bug fixing, Devin struggled to complete tasks, often taking days instead of hours and producing unusable solutions. Researchers highlighted Devin's tendency to pursue impossible solutions and its inability to recognize fundamental roadblocks.
Read more »
Tool touted as 'first AI software engineer' is bad at its job, testers claimNailed just 15% of assigned tasks
Read more »
Laura Muller: Haas employ German as first female race engineer in Formula 1Haas restructure their race operations team with a series of changes that include appointing Laura Muller as the first female race engineer in Formula 1.
Read more »
Haas appoints Formula 1's first female race engineerLaura Muller becomes Formula 1's first female race engineer, joining Esteban Ocon's team at Haas F1. Muller, a former performance engineer for Haas, was promoted based on her merits and work ethic. Her appointment signifies a step towards greater inclusivity in the sport.
Read more »
Home Alone Star Devin Ratray Avoids Jail Time After Pleading Guilty to Domestic ViolenceDevin Ratray, known for his role as Buzz McCallister in the Home Alone movies, has avoided jail time after pleading guilty to two counts of domestic violence. He was arrested in 2021 for allegedly strangling his girlfriend.
Read more »
Software Recalls Highlight Growing Pains of Connected CarsThe author details their frustrating experience with a software recall on their BMW, highlighting the challenges automakers face in managing software updates for millions of vehicles. The text emphasizes the increasing importance of software in modern cars, citing statistics on the rising number of software-related recalls and the implications for safety and customer experience.
Read more »