AI

Palona goes vertical, launches Vision, Workflow: 4 key lessons for AI builders

Building an enterprise AI company on a ‘quicksand foundation’ is the central challenge for founders today, according to leadership at Palona AI.

Today, the Palo Alto-based startup – led by former Google and Meta engineering veterans – is making a decisive vertical push into the restaurant and hospitality space with today’s launch of Palona Vision and Palona Workflow.

The new offering transforms the company’s multi-modal agent suite into a real-time operating system for restaurant operations, spanning cameras, calls, conversations and coordinated task execution.

The news marks a strategic pivot from the company’s early 2025 debut, when it first emerged $10 million in seed funding to build emotionally intelligent sales agents for broad direct-to-consumer companies.

Narrowing the focus to a “multimodal native” approach for restaurants, Palona now provides a blueprint for AI builders on how to go beyond “thin wrappers” and build deep systems that solve big problems in the physical world.

“You build a company on a foundation of sand – not quicksand, but shifting sand,” says co-founder and CTO Tim Howes, referring to the instability of the current LLM ecosystem. “So we built an orchestration layer that allows us to exchange models in terms of performance, fluidity and cost.”

VentureBeat recently spoke personally with Howes and co-founder and CEO Maria Zhang at – Where Else? – a NYC restaurant on the technical challenges and hard lessons learned from their launch, growth and pivot.

The new offering: vision and workflow as ‘digital GM’

For the end user (the restaurant owner or operator), Palona’s latest release is designed to function as an automated “best operations manager” that never sleeps.

Palona Vision uses in-store security cameras to analyze operational signals such as queue lengths, table turnover, preparation bottlenecks and cleanliness, without the need for new hardware.

It monitors front-of-house data such as queue lengths, table turns and cleanliness, while simultaneously identifying back-of-house issues such as preparation delays or station setup errors.

See also  Lightricks just made AI video generation 30x faster — and you won't need a $10,000 GPU

Palona Workflow supplements this by automating operational processes in multiple steps. This includes managing catering orders, opening and closing checklists and preparing food. By correlating Vision video signals with Point-of-Sale (POS) data and staffing, Workflow ensures consistent execution across multiple locations.

“Palona Vision is like giving every location a digital GM,” said Shaz Khan, founder of Tono Pizzeria + Cheesesteaks, in a press release to VentureBeat. “It identifies problems before they escalate and saves me hours a week.”

Going vertical: lessons in domain expertise

Palona’s journey started with a star-studded roster. CEO Zhang was previously VP of Engineering at Google and CTO of Tinder, while co-founder Howes is the co-inventor of LDAP and former CTO of Netscape.

Despite this background, the team’s first year was a lesson in the need for focus.

Initially, Palona served fashion and electronics brands, creating “wizard” and “surfer guy” personalities to handle sales. However, the team quickly realized that the restaurant industry offered a unique trillion-dollar opportunity that was “surprisingly recession-proof” but “baffled” by operational inefficiencies.

“Advice to startup founders: don’t go multi-industrial,” Zhang warned.

By verticalizing, Palona has grown from a ‘thin’ chat layer to a ‘multi-sensory information pipeline’ that processes vision, voice and text simultaneously.

That clarity of focus opened up access to proprietary training data (such as prep playbooks and call transcripts) while avoiding the scraping of generic data.

1. Building on ‘shifting sand’

To meet the reality of enterprise AI implementations in 2025 – with new, improved models emerging almost every week – Palona developed a proprietary orchestration layer.

Rather than being ‘bundled’ with a single provider like OpenAI or Google, Palona’s architecture allows them to swap models on a dime based on performance and cost.

They use a mix of proprietary and open source models, including Gemini for computer vision benchmarks and specific language models for fluency in Spanish or Chinese.

See also  Revelo's LatAm talent network sees strong demand from US companies, thanks to AI

For builders, the message is clear: never let the core value of your product depend on one supplier.

2. From words to ‘world models’

The launch of Palona Vision represents a shift from understanding words to understanding the physical reality of a kitchen.

While many developers struggle to connect separate APIs, Palona’s new vision model transforms existing in-store cameras into operational assistants.

The system identifies ’cause and effect’ in real time: recognizes whether a pizza is undercooked due to its ‘light beige’ color or alerts a manager if a display case is empty.

“In words, physics doesn’t matter,” Zhang explained. “But in reality I drop the phone, it’s always going off… we really want to know what’s going on in this world of restaurants.”

3. The ‘Muffin’ solution: custom memory architecture

One of the main technical hurdles Palona faced was memory management. In a restaurant context, memory is the difference between a frustrating interaction and a “magical” interaction where the agent remembers a diner’s “usual” order.

The team initially used an unspecified open source tool, but found that it produced errors 30% of the time. “I think consulting developers always turn off memory [on consumer AI products]because that is guaranteed to mess everything up,” Zhang warned.

To solve this, Palona built Muffin, a proprietary memory management system called as a nod to web cookies. Unlike standard vector-based approaches that struggle with structured data, Muffin is designed to deal with four different layers:

  • Structured data: stable facts such as delivery addresses or allergy information.

  • Slowly changing dimensions: loyalty preferences and favorite items.

  • Transient and seasonal memories: Adjusting to shifts, such as preference for cold drinks in July versus hot chocolate in winter.

  • Regional context: default settings such as time zones or language preferences.

The lesson for builders: if the best available tool isn’t good enough for your specific industry, be prepared to build your own.

See also  Former Y Combinator president Geoff Ralston launches new AI ‘safety’ fund

4. Reliability through ‘GRACE’

In a kitchen, an AI error isn’t just a typo; it’s a wasted order or a security risk. A recent incident on Stefanina’s Pizzeria in Missouri, where an AI almost hallucinated fake deals during a dinnerhighlights how quickly brand trust can evaporate if safeguards are not in place.

To avoid such chaos, Palona’s engineers monitor the internal system GRACE framework:

  • Guardrails: Hard limits on agent behavior to prevent unapproved promotions.

  • Red Teaming: Proactive efforts to ‘break’ the AI ​​and identify potential hallucination triggers.

  • App Sec: Lock down APIs and third-party integrations with TLS, tokenization, and attack prevention systems.

  • Compliance: Each response is based on verified, vetted menu data to ensure accuracy.

  • Escalation: Refer complex interactions to a human manager before a guest receives incorrect information.

This reliability is verified through large-scale simulation. “We simulated a million ways to order pizza,” said Zhang, using one AI to act as a customer and the other to take the order, measuring accuracy to eliminate hallucinations.

The bottom line

With the launch of Vision and Workflow, Palona is betting that the future of enterprise AI is not in broad assistants, but in specialized ‘operating systems’ that can see, hear and think within a specific domain.

Unlike general-purpose AI agents, Palona’s system is designed to run restaurant workflows and not just respond to queries. It is able to remember customers, hear them order their ‘usual’ order, and monitor restaurant operations to ensure they are delivering that customer’s food according to their internal processes and guidelines, signaling when something goes wrong or is critical. about go wrong.

For Zhang, the goal is to let human operators focus on their craft: “When you get that delicious food… we’ll tell you what to do.”

Source link

Back to top button