AI

OpenAI launches new MacOS app for agentic coding

AI is already having a huge impact on the way software is written, with much of the heavy programming work now being done by swarms of agents and sub-agents. But as developers experiment with new interfaces and form factors for human-AI collaboration, it’s becoming difficult for even the most advanced AI labs to keep up.

The current trend is the development of agentic software – systems in which AI agents can work independently on coding tasks – epitomized by the Claude Code and Cowork apps. In the meantime, OpenAI has been gradually building out its Codex tool, which was launched as a command line tool last April and extended to a web interface a month later.

Now OpenAI is taking a big step toward catching up. The company launched a new one on Monday MacOS app for Codex, incorporating many of the agentic practices that have become popular in the past year. The new app is designed to work and integrate with multiple agents in parallel skills of agents and other advanced workflows. The launch also comes less than two months later the launch of GPT-5.2-CodexOpenAI’s most powerful coding model, which the company hopes will be enough to entice Claude Code users.

“If you want to do really advanced work on something complex, 5.2 is by far the strongest model,” CEO Sam Altman told reporters at a news conference. “It was harder to use, though, so if we put that level of modeling capabilities into a more flexible interface, we think it will make quite a difference.”

While Altman’s confidence in GPT-5.2 is understandable, encryption benchmarks tell a more complicated story. GPT-5.2 does apply first place on TerminalBench (a test that measures how well AI handles command-line programming tasks), at least as of the time of writing. But agents from Gemini 3 and Claude Opus achieved roughly equivalent scores – lower, but within the benchmark’s margin of error. Results of SWE bankanother coding benchmark that tests AI’s ability to fix software bugs in the real world are similar and show no clear advantage for GPT-5.2. However, agentic use cases are difficult to benchmark effectively, and state-of-the-art models can vary significantly in terms of user experience.

See also  Why AI Developers Are Buzzing About Claude 3.5’s Computer Use Feature

The Codex app also comes with a range of new features that OpenAI says will help it reach parity or, in some cases, surpass the various Claude apps. The Codex app enables automations to run on an automatic schedule in the background, with the results queued to be reviewed when the user returns. Users can also select different personalities for the agent – ​​from pragmatic to empathetic – depending on their work style.

But for the company, the biggest selling point is the sheer speed of development made possible by AI. “You can use this from a clean sheet of paper, brand new, to create a very sophisticated piece of software in just a few hours,” Altman said. “As fast as I can type in new ideas, that’s the limit of what can be built.”

WAN event

Boston, MA
|
June 23, 2026

Source link

Back to top button