Anthropic says it solved the long-running AI agent problem with a new multi-session Claude SDK

19 3 minutes read

Agent memory remains a problem that companies want to solve, as agents forget some instructions or conversations the longer they last.

Anthropic believes she has solved this problem for her Claude Agent SDKdeveloping a dual solution that allows an agent to work in different context windows.

“The core challenge of long-term agents is that they must work in separate sessions, and each new session begins with no memory of what came before,” Anthropic wrote in a blog post. “Because context windows are limited and because most complex projects can’t be completed within a single window, agents need a way to bridge the gap between coding sessions.”

Anthropic engineers proposed a two-pronged approach for the Agent SDK: an initialization agent to set up the environment, and a coding agent to make incremental progress in each session and leave artifacts for the next.

The agent’s memory problem

Because agents are built on base models, they remain limited by the limited, though constantly growing, context windows. For long-running agents, this can cause a bigger problem, causing the agent to forget instructions and behave abnormally while performing a task. Improving agent memory becomes essential for consistent, business-safe performance.

Several methods have emerged over the past year, all attempting to bridge the gap between context windows and agent memory. LongChain‘s LangMem SDK, Memo base And OpenAI‘s Swarm are examples of companies that offer memory solutions. Research on agent memory has also exploded recently frameworks such as Memp and the Nested learning paradigm by Googling offering new alternatives to improve memory.

Many of today’s memory frameworks are open source and can ideally adapt to various large language models (LLMs) that power agents. Anthropic’s approach improves the Claude Agent SDK.

How it works

Anthropic determined that while the Claude Agent SDK had context management capabilities and “should allow an agent to continue doing useful work for any length of time,” this was not sufficient. The company says in its blog post that it is a model such as Opus 4.5 running the Claude Agent SDK “can’t fulfill building a production quality web app if it only gets a high-level prompt like ‘build a clone of claude.ai.’”

The failures manifested themselves in two patterns, Anthropic said. First, the agent tried to do too much, causing the model in the middle to become out of context. The agent then has to guess what happened and cannot pass clear instructions to the next agent. The second error occurs later, after some functions have already been built. The officer sees that progress has been made and simply declares that the job is done.

Anthropic researchers have worked out the solution: setting up an initial environment to lay the foundation for functions and prompt each agent to make incremental progress toward a goal, while still leaving a clean slate at the end.

This is where the Anthropic agent’s two-part solution comes in handy. The initialization agent sets up the environment, records what agents have done and which files have been added. The coding agent will then prompt the models to make incremental progress and leave structured updates.

“Inspiration for these practices came from knowing what effective software engineers do every day,” Anthropic said.

The researchers said they added testing tools to the coding agent, improving its ability to identify and fix bugs that weren’t obvious from the code alone.

Future research

Anthropic noted that its approach is “one possible set of solutions in a long-running agent armor.” However, this is just the early stages of what could become a broader area of research for many in the AI space.

The company says its experiments to increase agents’ long-term memory have not shown whether a single general-purpose encoding agent works best in different contexts or in a multi-agent structure.

The demo also focused on full-stack web app development, so other experiments should focus on generalizing the results across different tasks.

“It is likely that some or all of these lessons can be applied to the types of long-term agentic tasks needed in, for example, scientific research or financial modeling,” Anthropic said.

Source link