How an agentic AI bot managed an office vending machine

harborsparrow · Dec 21, 2025

https://www.wsj.com/tech/ai/anthrop...4?st=5XwN7g&reflink=desktopwebshare_permalink

Greg Bernhardt · Dec 21, 2025

Is this the same?
https://www.anthropic.com/research/project-vend-1

Here is a non gated version of the WSJ story
https://futurism.com/future-society/anthropic-ai-vending-machine

Still think AI is ready to revolutionize the economy? A new experiment might change your mind.

In a bold test of Anthropic’s latest version of its AI Claude, The Wall Street Journal gave the large language model (LLM) a shot at running an office vending machine. The result was an unmitigated — if unintentionally comical — disaster, forcing the team in charge to pull the plug after three weeks.

harborsparrow · Dec 21, 2025

Yes, looks like the same case.

jedishrfu · Dec 21, 2025

Wow, lets give an AI some money and see where it goes off the rails. Then have another put a stop to it. Its definitely a bold experiment though. I wonder who got the game machine.

—-

It reminds me of the Stanford Prison Experiment which had to be shutdown six days in, before someone suffered significant psychological damage from guard emotional abuse.

Borg · Dec 22, 2025

This sums it up nicely.
https://ploum.net/2025-12-19-prepare-for-that-world.html

jrmichler · Dec 22, 2025

Greg Bernhardt said:

Is this the same?
https://www.anthropic.com/research/project-vend-1

A quote from the above quote: "Many of the mistakes Claudius made are very likely the result of the model needing additional scaffolding—that is, more careful prompts, ..." So Anthropic says that it's the user's fault for not getting the prompt exactly correct, but they don't know for sure.

In my world of machine design, this would be an example of pushing an unfinished product out the door and fixing problems in the field. I once worked for a company that went bankrupt doing exactly that. I find it interesting that the entire world is madly rushing into this new technology.

harborsparrow · Dec 22, 2025

Borg said:

This sums it up nicely.
https://ploum.net/2025-12-19-prepare-for-that-world.html

Although these different articles are about the same case, the written WSJ article is actually worth reading. The link is not pay-walled. It was written by the woman who oversaw the agent at first. The failures started after 70 odd people got on a Slack channel and "negotiated" furiously with the agent, giving it all kinds of bizarre suggestions that a real human would have just laughed at or ignored. Clearly, Anthropic was not aware of how strongly the agent's programming led it to want to please, so it lost than the primary directive it was given "to make a profit". How like a child that is. It was not a case of having not been given good requirements.

A human intelligence spends years learning about the real world (ideally with near constant, gentle oversight and correction). When a human makes a mistake, there is generally a real world consequence. Not so to an AI, which has been trained up quickly and whose mistakes may go uncorrected (for lack of an actual way for those interacting with it to issue actual correction) and whose programming necessarily has been slanted towards politeness and pleasing people (early versions used to curse at people). No wonder that this agent lost track of priorities when faced with a barrage of wheedles.

In my mind, this is a valid test case. And it will be telling if or when an automated agent CAN do the job reliably. We don't actually have proof that it can, yet. Why are people even assuming it ever will? All the agentic AI's I am encountering lately in, say, phone answering services, hotel reservations and the like, are inevitably what I would call at the infuriate-the-customer stage. Companies may save money by firing employees but they will also likely lose some customers, and so the verdict on business viability is still out. I will not stay at a Hyatt right now due to their badly AI driven phone system.

Other specialized AIs already unleashed on the world, such as those behind self driving cars or to recognize flying objects, are known to have made serious mistakes. And are being allowed to. However ridiculous this test case was, a lot more things like it need to be done.

jedishrfu · Dec 22, 2025

I can relate to the unfinished product. We had error metrics for bugs:
- fixed in development $20
- fixed in beta $200
- fixed in production $2000

It was due to the costs of sending out CD updates and multiple customer calls on the same error.

Of course, now the game has changed with automatic downloads where a fix can be pushed to all product users as soon as its released. The only fear here is bricking a device or losing user content if the fix is critical and poorly executed.

How an agentic AI bot managed an office vending machine

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

What Free Privacy-Focused AI Chatbots Don’t Use My Data for Training?

How far will we let AI control us?

If you think having a backup is too expensive, try not having one

Impersonation News

Cooling a processor chip

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight