AI

Booking.com’s agent strategy: Disciplined, modular and already delivering 2× accuracy

When many companies weren’t even thinking about agent behavior or infrastructure, Booking.com had already ‘encountered’ them with its own conversational recommendation system. These early experiments have allowed the company to take a step back and avoid being swept up in the frenetic hype of AI agents. Instead, it takes a disciplined, layered, modular approach to model development: small, trip-specific models for low-cost, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tailored assessments built in-house when precision is critical. With this hybrid strategy – combined with selective collaboration with OpenAI – Booking.com has seen a doubling of accuracy in key data retrieval, ranking and customer interaction tasks. As Pranav Pathak, AI product development lead at Booking.com, put it to VentureBeat in a new podcast: “Do you build it very, very specialized and customized and then have an army of a hundred agents? Or do you keep it general enough and have five agents who are good at general tasks, but then you have to orchestrate a lot around it? That’s a balance that I think we’re still discovering, along with the rest of the industry.” Check out the new one Past the pilot podcast hereand read on for highlights.

From guessing to deep personalization without being ‘creepy’

Recommendation systems are at the core of Booking.com’s customer-facing platforms; Traditional recommendation tools, however, were less about recommendations and more about guessing, Pathak admitted. That’s why he and his team pledged from the start to avoid generic tools: as he put it, price and recommendation should be based on the customer’s context. Booking.com’s initial pre-gen AI tooling for detecting intent and topics was a small language model, which Pathak described as “the scale and scope of BERT.” The model processed the customer’s input around their issue to determine whether the issue could be resolved through self-service or routed to a human agent. “We started with an architecture of ‘you need to call a tool if this is the intent you’re detecting and this is how you parsed the structure,’” Pathak explains. “That was very similar to the first few agentic architectures that came out in terms of reason and defining a tool call.” His team has since built out that architecture with an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG), and creates APIs or smaller, calls specialized language models. “We were able to scale that system quite well because it was so close in architecture that, with a few tweaks, we now have a full agent stack,” said Pathak. As a result, Booking.com is seeing a doubling of topic detection, freeing up human agent bandwidth by 1.5 to 1.7 times. Ultimately, this will support more self-service, allowing human agents to focus on customers with uniquely specific issues for which the platform doesn’t have a dedicated tool flow, for example a family not being able to access their hotel room at 2 a.m. when the front desk is closed. That “is really starting to compound,” but also has a direct, long-lasting impact on customer retention, Pathak noted. “One of the things we’ve seen is that the better we are at customer service, the more loyal our customers are.” Another recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website — an unrealistic number for any human to sift through, Pathak noted. That’s why his team introduced a free text box where users can type to immediately receive customized filters. “That becomes such an important signal for personalization in terms of what you’re looking for, in your own words rather than in a clickstream,” says Pathak. In turn, it teaches Booking.com what customers actually want. For example, hot tubs: When filter personalization was first introduced, hot tubs were one of the most popular requests. That wasn’t even a consideration before; there wasn’t even a filter. Now that filter is live. “I had no idea,” Pathak noted. “Honestly, I had never looked for a hot tub in my room.” When it comes to personalization, however, there is a fine line; memory remains complicated, Pathak emphasizes. While it’s important to have long-term reminders and evolving conversations with customers – keeping track of information such as their usual budgets, favorite hotel star ratings or whether they require disabled access – this should be done on their terms and protect their privacy. Booking.com is extremely careful with memory and asks permission not to be “creepy” when collecting customer information. “Managing memory is much more difficult than actually building memory,” says Pathak. “The technology is there, we have the technical knowledge to build it. We want to make sure that we don’t launch a memory object that doesn’t respect customer consent, that doesn’t feel very natural.”

See also  How Ricursive Intelligence raised $335M at a $4B valuation in 4 months

Finding a balance between building and buying

As agents mature, Booking.com is addressing a central question facing the entire industry: how limited should agents become? Rather than committing to a swarm of highly specialized agents or a few generic agents, the company strives for reversible decisions and avoids “one-way doors” that lock its architecture into lengthy, costly paths. Pathak’s strategy is to generalize where possible, specialize where necessary, and keep agent design flexible to help ensure resilience. Pathak and his team are “keenly aware” of use cases and evaluating where to build more general, reusable agents or more task-specific agents. They strive to use the smallest possible model for every use case, with the highest level of accuracy and output quality. What can be generalized is. Latency is another important consideration. When factual accuracy and avoiding hallucinations are paramount, his team will use a larger, much slower model; but with search and recommendations, user expectations determine speed. (Pathak noted: “No one’s patient.”) “For example, we would never use something as heavy as GPT-5 for subject detection alone or for entity extraction,” he said. Booking.com takes a similarly elastic approach when it comes to monitoring and evaluations: if it’s general monitoring that someone else is better at building and has horizontal capabilities, they will buy it. But when it comes to cases where brand guidelines need to be enforced, they build their own evaluations. Ultimately, Booking.com has become ‘super anticipatory’, agile and flexible. “Right now, with everything happening with AI, we’re a little bit wary of walking through one-way doors,” Pathak says. “We want as many of our decisions as possible to be reversible. We don’t want to get stuck in a decision that we can’t reverse in two years.”

See also  Thinking Machines challenges OpenAI's AI scaling strategy: 'First superintelligence will be a superhuman learner'

What other builders can learn from Booking.com’s AI journey

Booking.com’s AI journey can serve as an important blueprint for other companies. Looking back, Pathak acknowledged that they started with a “pretty complicated” tech stack. That puts them in a good place now, “but we probably could have started with something much simpler and seen how customers handled it.” So he gave this valuable advice: If you’re just starting out with LLMs or agents, off-the-shelf APIs will do just fine. “There is enough customization with APIs that you can have a lot of influence before you decide you want to do more.” On the other hand, if a use case requires customization that isn’t available through a standard API call, that makes a case for internal tools. Still, he emphasized: don’t start with complicated things. Tackle the “simplest, most painful problem you can find and the simplest, most obvious solution to it.” Identify product market fit and then explore the ecosystems, he advised — but don’t just rip out old infrastructures because a new use case requires something specific (like moving an entire cloud strategy from AWS to Azure just to use the OpenAI endpoint). Ultimately, “Don’t lock yourself away too early,” Pathak noted. “Don’t make one-way decisions until you are sure this is the solution you want to use.”

Source link

Back to top button