No, LLMs are not reliable

Photo by Victoriano Izquierdo / Unsplash

I keep on hammering this simple fact as I'm still puzzled that so few corporate innovation lab understand how unreliable LLMs are and how any AI projects should be assessed in that regard.

In December 2025, Anthropic set up an AI-powered vending machine in The Wall Street Journal newsroom and let its language model, nicknamed Claudius, run it autonomously, deciding what to stock, setting prices, and handling customer requests.

Within days, staffers manipulated the AI with persuasive prompts: it dropped prices to zero, declared an "Ultra-Capitalist Free-For-All," and gave away nearly all its inventory free of charge, including a PlayStation 5, bottles of wine, and even a live fish (don't ask).

After about three weeks, the experiment ended with the machine more than $1,000 in debt, highlighting how autonomous AI agents are by design open to social engineering and how LLMs cannot manage real-world business tasks independently.

In case of doubt, I'll link again my cheat sheet to deal with AI autonomy and 'safety':

Tesla's toxic securitization

Killing Tesla softly For Tesla, the revenue reality check is there. In 2025, 73% of Tesla’s $94.8 billion in revenue came from car sales – a business now in free fall (down 10% YoY, largely attributable to the rise of BYD and the unsufferable political views of Elon Musk)

🟢 The state of AI in 2026

A rapid survey of where AI can deliver ROI looking forward and a discussion of the different levels of trust you can give to LLMs.

I’ve always heard that Hermès is very costly. It’s not expensive. It’s costly...

“I’ve always heard that Hermès is very costly. It’s not expensive. It’s costly. The cost is the actual price of making an object properly with the required level of attention so that you have an object of quality. Expensive is a product, which is not delivering what

Read next