OpenAI’s Operator agent helped me move, but I had to help it, too

February 4, 2025

1 5 minutes read

OpenAi gave me a week to test his new AI agent, operator, a system that can perform independent tasks for you on the internet.

Operator is closest to the vision of the technical industry on AI agents – systems that can automate the boring parts of life and free us to do the things we really love. Based on my experience with the Agent of OpenAi, however, really ‘autonomous’ AI systems are still just out of reach.

Openi A new model trained for Power OperatorThat the visual understanding of GPT-4O combines with the reasoning options of O1.

That model seems to work well for basic tasks; I have looked at the operator on the buttons, navigate by menus on websites and fill in forms. The AI was occasionally successful in the independent entrepreneurship of actions, and it works much faster than web -based agents I have seen from Anthropic and Google.

But during my process I noticed that I helped the OpenAi agent more than I would like. It felt like I coached the operator because of every problem, while I wanted to push certain tasks completely off my plate.

During my test I too often had to answer different questions, grant permissions, fill in personal information and help the agent when he got stuck.

In the auto-row, the operator is when driving a car with cruise control and get your foot off the pedals and let the car drive itself, but it is far from a completely blown steering machine.

OpenAi even says that the frequent breaks of the operator are due to design.

The AI Power operator, just like the AI who drives chatbots, such as the Chatgpt from OpenAi, cannot work independently for a long time, and it is susceptible to the same type of hallucination. That is why OpenAI does not want to provide the system too much decision -making or sensitive user information. Perhaps that is a safe choice of OpenAi, but it reduces the usability of the operator.

That said, the first agent of OpenAi is an impressive proof of concept – and interface – for an AI that can use the front of any website. But to create really independent AI systems, technology companies must build more reliable AI models that do not require that much control.

A bit too ‘hands on’

My operator test coincided with the week that I moved apartments, so I had the help of OpenAi’s agent in moving logistics.

I asked the operator to help me buy a new parking permit. The OpenAi agent told me: “Of course”, then opened a window in the browser on the screen of my PC.

Operator then carried out a search for a parking permit from San Francisco in the browser, brought me to the right city website and even on the right page.

With the operator you can still use the rest of your computer while it works, something that cannot be said for Google’s project Mariner. This is because the Agent from OpenAi does not really work on the computer, but rather somewhere in the cloud.

The Operator Interface (Credit: Maxwell Zeff/OpenAi)

For my parking permit I had to give the operator permission to start different processes several times. It also stopped to ask me to fill in forms with personal information – such as my name, telephone number and e -mail address. Sometimes the operator also got lost, so I had taken over control over the browser and to get the agent back on track.

In another test I asked the operator to make me a reservation in a Greek restaurant. To his honor, the operator thought I was a nice place in my area with reasonable prices. But I had to answer more than half a dozen questions during the current.

A few steps to make a reservation at Operator (Credit: Maxwell Zeff/OpenAi)

If you have to intervene six or more to book a reservation through an AI agent, at what time is it easier to just do it yourself? That is a question that I have asked myself a lot while testing the operator.

Agent-as-a-platform

In some of my tests I came across websites for that operator blocked. For example, I tried to book an electrician with the help of Taskrabbit, but the OpenAi agent told me that he encountered an error and asked if it could use an alternative service instead. Expedia, Reddit and YouTube have also blocked the AI agent by gaining access to their platforms.

However, other services embrace operator with open arms. Instacart, Uber and Ebay worked with OpenAI for the launch of the operator, so that the agent could navigate their websites on behalf of people.

These companies prepare for a future where a subset of user interactions is facilitated by an AI agent.

“Customers use Instacart via various access points,” said Daniel Danker, Chief Product Officer at Instacart, in an interview with WAN. “We see Operator as, possibly, one of those access points.”

By having the OpenAI agent used, the Instacart website on behalf of a person it seems that this Instacart would separate from his customers. However, Danker says that Instacart wants to meet customers wherever they are.

“We are really bullish about our faith, similar to OpenAI, that agent systems will have a major impact on how consumers deal with digital properties,” said Ebay’s Chief AI officer, Nitzan Mekel-Bobrov, in an interview with WAN.

Even if AI agents rise in popularity, Mekel-Bobrov says that he expects users to always come to the eBay website, and notes that “online destinations go nowhere.”

Trust problems

I had some problems to trust operator after it hallucinated a few times and cost me almost a few hundred dollars.

For example, I asked the agent to find a parking garage near my new apartment. In the end it suggested two garages that it said would only take a few minutes to walk.

Hallucination over parking distances (Credit: Maxwell Zeff/OpenAi)

In addition to the fact that they were out of my price range, the garages were actually very far from my apartment. One was a 20 -minute walk and the other was a 30 -minute walk. It appears that the operator had placed the wrong address.

This is exactly why OpenAI does not give the agent your credit card number, passwords or access to E -mail. If OpenAI did not let me intervene here, the operator would have wasted hundreds of dollars on a parking lot I didn’t need.

Hallucinations such as these are an important roadblock for actual useful autonomous agents – those who can take annoying tasks from your plate. Nobody will trust agents if they are susceptible to errors errors, especially mistakes with real consequences.

With operator, OpenAi seems to have built a number of impressive tools to browse the internet on the internet. But these tools do not come out much until the substantiation of AI can reliably do what users ask to do. Until that time, people will hold on -hold agents – not the other way around. And that beats the point.

Source link