On Tuesday, OpenAI introduced a suite of novel tools aimed at assisting developers and enterprises in creating AI agents, which are automated systems capable of performing tasks independently, using the company’s proprietary AI models and frameworks.
These tools are part of OpenAI’s newly launched Responses API, enabling businesses to develop customized AI agents that can perform tasks such as web searches, scanning company files, and navigating websites, similar to OpenAI’s Operator product. The Responses API will effectively replace OpenAI’s Assistants API, which the company plans to discontinue in the first half of 2026.
Despite the increasing excitement surrounding AI agents in recent years, the tech industry has struggled to clearly define what “AI agents” truly are, and the hype often surpasses the actual utility of these agents. A recent example is the Chinese startup Butterfly Effect, which went viral with its AI agent platform called Manus, only to be discovered that it failed to deliver on many of its promises.
In light of this, the stakes are high for OpenAI to successfully deliver on its agent technology.
“While it’s relatively easy to demonstrate an agent, scaling it is a much more challenging task, and getting people to use it consistently is even harder,” said Olivier Godemont, OpenAI’s API product head, in an interview with TechCrunch.
Earlier this year, OpenAI introduced two AI agents in ChatGPT: Operator, which navigates websites on behalf of the user, and deep research, which compiles research reports. Although these tools showcased the potential of agentic technology, they fell short in terms of autonomy.
With the Responses API, OpenAI aims to provide developers with access to the components that power AI agents, enabling them to build their own applications that incorporate Operator- and deep research-style agentic capabilities. The company hopes that developers can create more autonomous applications using its agent technology.
Using the Responses API, developers can leverage the same AI models that power OpenAI’s ChatGPT Search web search tool, including GPT-4o search and GPT-4o mini search. These models can browse the web to find answers to questions, citing sources as they generate responses.
OpenAI claims that GPT-4o search and GPT-4o mini search are highly accurate in terms of factual information. On the company’s SimpleQA benchmark, which assesses a model’s ability to answer short, fact-seeking questions, GPT-4o search scores 90%, while GPT-4o mini search scores 88% (with higher scores indicating better performance). In comparison, GPT-4.5, OpenAI’s larger and recently released model, scores 63% on the same benchmark.
Although AI-powered search tools tend to be more accurate than traditional AI models, they are not immune to hallucinations. Furthermore, these tools often struggle with short, navigational queries, and there have been reports suggesting that ChatGPT’s citations are not always reliable.
The Responses API also includes a file search utility that can quickly scan company databases to retrieve information. OpenAI assures that it will not train models on these files. Additionally, developers using the Responses API can utilize OpenAI’s Computer-Using Agent (CUA) model, which powers Operator, to automate computer use tasks such as data entry and app workflows.
Enterprises have the option to run the CUA model locally on their own systems, according to OpenAI. The consumer version of the CUA available in Operator is limited to taking actions on the web.
It is essential to note that the Responses API will not solve all the technical issues currently plaguing AI agents.
While AI-powered search tools are more accurate than traditional AI models, they are not perfect and can still hallucinate. GPT-4o search, for instance, gets 10% of factual questions wrong. Moreover, AI search tools tend to struggle with short, navigational queries, and there have been concerns regarding the reliability of ChatGPT’s citations.
In a blog post provided to TechCrunch, OpenAI acknowledged that the CUA model is “not yet highly reliable for automating tasks on operating systems” and is prone to making “inadvertent” mistakes.
However, OpenAI emphasized that these are early iterations of their agent tools and that the company is continually working to improve them.
In conjunction with the Responses API, OpenAI is releasing an open-source toolkit called the Agents SDK, which provides developers with free tools to integrate models with their internal systems, implement safeguards, and monitor AI agent activities for debugging and optimization purposes. The Agents SDK is an extension of OpenAI’s Swarm, a framework for multi-agent orchestration released late last year.
Godemont expressed his hope that OpenAI can bridge the gap between AI agent demos and products this year, stating that “agents are the most impactful application of AI that will happen.” This sentiment echoes a statement made by OpenAI CEO Sam Altman in January, proclaiming that 2025 will be the year AI agents enter the workforce.
Whether or not 2025 indeed becomes the “year of the AI agent,” OpenAI’s latest releases demonstrate the company’s commitment to shifting from flashy agent demos to impactful tools.
Source Link