AI

To scale agentic AI, Notion tore down its tech stack and started fresh

Many organizations would hesitate to overhaul their tech stack and start from scratch. Not Concept. For the 3.0 version of the productivity software (released in September), the company didn’t hesitate to completely rebuild it; they recognized that it was in fact necessary to support agentic AI at enterprise scale. While traditional AI-powered workflows involve explicit, step-by-step instructions based on learning in a few steps, AI agents, powered by advanced reasoning models, are thoughtful about the definition of tools, can identify and understand what tools they have at their disposal, and plan next steps. “Rather than trying to tweak what we were building, we wanted to leverage the strengths of reasoning models,” Sarah Sachs, Notion’s head of AI modeling, told VentureBeat. “We rebuilt a new architecture because workflows are different from agent workflows.”

Re-orchestrate so that models can work autonomously

Notion has been adopted by 94% of the Forbes AI 50 companies, has a total of 100 million users and counts among its customers OpenAI, Cursor, Figma, Ramp and Vercel. In a rapidly evolving AI landscape, the company has identified the need to move beyond simpler, task-based workflows to goal-oriented reasoning systems that allow agents to autonomously select, orchestrate and execute tools in connected environments.

Very quickly, reasoning models have become “much better” at learning to use tools and follow Chain of Thought (CoT) instructions, Sachs noted. This allows them to be “much more independent” and make multiple decisions within one agent workflow. “We rebuilt our AI system to respond to that,” she said. From a technical perspective, this meant replacing rigid, prompt-based flows with a unified orchestration model, Sachs explains. This core model is supported by modular sub-agents that query Notion and the web, query and add databases, and edit content. Each agent uses tools contextually; For example, they can decide whether they want to search in Notion itself, or in another platform such as Slack. The model will perform successive searches until the relevant information is found. It can then, for example, convert notes into proposals, create follow-up messages, track tasks, and discover and make updates to knowledge bases. In Notion 2.0, the team focused on getting AI to perform specific tasks, which required them to “think exhaustively” about how to drive the model, Sachs noted. However, with version 3.0, users can assign tasks to agents, and agents can actually take action and perform multiple tasks simultaneously. “We re-orchestrated it so that it’s self-selective in terms of the tools, rather than just having a few shots, which explicitly tells you how to go through all these different scenarios,” Sachs explains. The goal is to ensure everything aligns with AI and that “anything you can do, your Notion agent can do.”

See also  NASA and Google are building an AI medical assistant to keep Mars-bound astronauts healthy

Bifurcation to isolate hallucinations

Notion’s philosophy of ‘better, faster, cheaper’ drives a continuous iteration cycle that balances latency and accuracy through fine-tuned vector embedding and elastic search optimization. Sachs’ team uses a rigorous evaluation framework that combines deterministic testing, local optimization, human-annotated data, and LLMs as judges with model-based scoring that identifies discrepancies and inaccuracies. “By splitting the evaluation, we can identify where the problems are coming from, and that helps us isolate unnecessary hallucinations,” Sachs explains. Additionally, simplifying the architecture itself means it’s easier to make changes as models and techniques evolve. “We optimize latency and parallel thinking as much as possible,” leading to “much better accuracy,” Sachs said. Models are based on data from the Internet and the Notion-connected workspace. Ultimately, Sachs reported, the investment in rebuilding Notion’s architecture has already delivered returns in terms of capabilities and a faster pace of change. She added: “We are completely open to rebuilding it, when the next breakthrough happens, if that is necessary.”

Understanding contextual latency

When building and refining models, it’s important to understand that latency is subjective: AI should deliver the most relevant information, not necessarily the most, at the expense of speed. “You’d be surprised at the different ways customers are willing to wait for things and not wait for things,” Sachs said. It makes for an interesting experiment: how slow can you go before people abandon the model? For example, in purely navigational search, users may not be as patient; they want answers almost immediately. “When you ask, ‘What is two plus two,’ you don’t want to wait for your agent to search all over Slack and JIRA,” Sachs noted. But the longer time is given, the more complete an agent of reasoning can be. For example, Notion can perform 20 minutes of autonomous work on hundreds of websites, files and other materials. In these cases, users are more willing to wait, Sachs explains; they allow the model to run in the background while they perform other tasks. “It’s a product question,” says Sachs. “How do we set user expectations based on the UI? How do we set user expectations around latency?”

See also  Smart Affiliate Tech 2026: What Works and What’s Just Hype

Notion is the largest user

Notion understands the importance of using its own product; in fact, employees are among the biggest users of power. Sachs explained that teams have active sandboxes that generate training and evaluation data, as well as a “really active” feedback loop for users. Users are not shy about saying what they think needs improvement or what features they would like to see. Sachs emphasized that when a user disapproves of an interaction, they explicitly give permission for a human annotator to analyze that interaction in a way that de-anonymizes it as much as possible. “We use our own tool as a company all day, every day, and that gives us very fast feedback loops,” Sachs said. “We are really dog-fooding our own product.” That said, it’s their own product they’re building, Sachs noted, so they understand they may have goggles on when it comes to quality and functionality. To balance this, Notion relies on “highly AI-savvy” design partners to gain early access to new capabilities and provide important feedback. Sachs emphasized that this is just as important as internal prototyping. “It’s all about experimenting in the open air. I think you get much richer feedback,” Sachs said. “Because at the end of the day, if we just look at how Notion uses Notion, we’re not really giving our customers the best experience.” Just as importantly, ongoing internal testing allows teams to evaluate progress and ensure models don’t degrade (as accuracy and performance decline over time). “Everything you do remains true,” Sachs explained. “You know your latency is within limits.”

See also  MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)

Many companies make the mistake of focusing too intently on retroactive Evans; this makes it difficult for them to understand how or where they are improving, Sachs noted. Notion views evaluations as a ‘litmus test’ of development and forward-looking progress, and evaluations of observability and regression resistance. “I think a big mistake that a lot of companies make is they confuse the two,” says Sachs. “We use them for both purposes; we think about them very differently.”

Takeaways from Notion’s journey

For enterprises, Notion can serve as a blueprint for responsibly and dynamically operationalizing agentic AI in a connected, empowered enterprise workspace. Sach’s takeaways for other technology leaders:

  • Don’t be afraid to rebuild as fundamental capabilities change; Notion has completely redesigned the architecture to align with reasoning-based models.

  • Treat latency as contextual: optimize per use case, rather than universally.

  • Ground all results in reliable, managed business data to ensure accuracy and confidence. She advised, “Be willing to make the tough decisions. Be willing to be at the top of the line, so to speak, in what you develop to build the best product you can provide for your customers.”

Source link

Back to top button