Google’s Cloud AI lead on the three frontiers of model capability

As product VP at Google Cloud, Michael Gerstenhaber works primarily on Vertex, the company’s unified platform for deploying business AI. It gives him a good look at how companies are actually using AI models, and what still needs to be done to unleash the potential of agentic AI.
When I spoke with Michael, I was particularly struck by one idea that I had not heard before. As he put it, AI models are simultaneously hitting three limits: raw intelligence, response time, and a third quality that has less to do with raw capacity than cost: whether a model can be deployed cheaply enough to run at massive, unpredictable scale. It’s a new way of thinking about the possibilities of models, and an extremely valuable one for anyone who wants to push frontier models in a new direction.
This interview has been edited for length and clarity.
Why don’t you start by describing your experience with AI so far, and what you do at Google?
I’ve been in AI for about two years now. I was at Anthropic for a year and a half, I’ve been at Google for almost six months now. I manage Vertex, Google’s developer platform. Most of our customers are engineers who build their own applications. They want access to agentic patterns. They want access to an agent platform. They want access to the inferences of the smartest models in the world. I give them that, but I don’t provide the applications themselves. Shopify, Thomson Reuters and our various customers can offer that in their own domains.
What drew you to Google?
I think Google is unique in the world because we have everything from the interface to the infrastructure layer. We can build data centers. We can buy electricity and build power plants. We have our own chips. We have our own model. We have the inference layer that we control. We have the agent layer that we control. We have APIs for memory, for writing interleaved code. We also have an agent engine that ensures compliance and governance. And then we even have the chat interface with Gemini enterprise and Gemini chat for consumers, right? Part of the reason I came here is because I saw Google as a unique vertically integrated company, and that’s a strength for us.
WAN event
Boston, MA
|
June 9, 2026
It’s strange because despite all the differences between companies, it feels like all three major labs really are close in terms of possibilities. Is it just a race for more intelligence, or is it more complicated than that?
I see three boundaries. Models like Gemini Pro are tuned for pure intelligence. Think about writing code. You just want the best code you can get, it doesn’t matter if it takes 45 minutes because I have to maintain it, I have to put it into production. I just want the best.
Then there is another limit with latency. If I’m doing customer support and I need to know how to apply a policy, you need intelligence to apply that policy. Are you allowed to handle a return? Can I upgrade my seat on an airplane? But it doesn’t matter how right you are if it took 45 minutes to get the answer. So for those cases you want the most intelligent product within that latency budget, because more intelligence no longer matters if that person gets bored and hangs up.
And then there’s that last bucket, where someone like Reddit or Meta wants to moderate the entire internet. They have big budgets, but they can’t take a business risk if they don’t know how it scales. They don’t know how many toxic messages there will be today or tomorrow. So they need to limit their budget to a model with the highest intelligence they can afford, but in a scalable way to an infinite number of topics. And for this the costs become very important.
One of the things I’m puzzling over is why it takes so long for agentic systems to catch on. It feels like the models are there and I’ve seen some great demos, but we’re not seeing the kind of big changes I expected a year ago. What do you think is holding it back?
This technology is actually two years old and still lacks a lot of infrastructure. We have no patterns to control what the agents do. We don’t have patterns for authorizing data to an agent. There are patterns that require work to bring them into production. And production is always an indication of what the technology is capable of. So two years is not long enough to see what intelligence supports in production, and that’s where people have a hard time.
I think it’s been uniquely fast in software engineering because it fits well into the software development life cycle. We have a development environment where it’s safe to break things, and then we promote from the development environment to the test environment. Writing code at Google requires two people to review that code and both confirm that it’s good enough to leave Google’s brand behind and give it to our customers. So we have a lot of those human processes that make the implementation exceptionally low risk. But we need to produce those patterns in other places and for other professions.




