Maybe AI agents can be lawyers after all

Last month I wrote about Mercor’s new benchmark that measures the capabilities of AI agents on professional tasks such as law and business analysis. At the time, the scores were quite dismal, with every major laboratory scoring less than 25%. So we concluded that lawyers were safe from AI displacement, at least for now.
But AI capabilities can change a lot in just a few weeks.
This week’s release of Anthropic’s Opus 4.6 was a wake-up call the rankingswith Anthropic’s new model scoring just under 30% in one-off tests, and an average of 45% when given a few more fixes to the problem. Notably, the release included a number of new agentic features, including ‘agent swarms’, which may have aided in this type of multi-step troubleshooting.
Regardless, the score is a huge leap forward from the previous state-of-the-art, and a sign that progress in foundation modeling is not slowing down. Mercor CEO Brendan Foody, who was particularly impressed, said: “A jump from 18.4% to 29.8% in a few months is insane.”

Thirty percent is still a long way from 100%, so lawyers don’t have to worry about being replaced by machines next week. But they should be a lot less confident than they were last month!




