No, LLMs are not reliable

No, LLMs are not reliable
Photo by Victoriano Izquierdo / Unsplash

I keep on hammering this simple fact as I'm still puzzled that so few corporate innovation lab understand how unreliable LLMs are and how any AI projects should be assessed in that regard.

In December 2025, Anthropic set up an AI-powered vending machine in The Wall Street Journal newsroom and let its language model, nicknamed Claudius, run it autonomously, deciding what to stock, setting prices, and handling customer requests.

This AI Vending Machine Was Tricked Into Giving Away Everything
Anthropic installed an AI-powered vending machine in the WSJ office. The LLM, named Claudius, was responsible for autonomously purchasing inventory from whole

Within days, staffers manipulated the AI with persuasive prompts: it dropped prices to zero, declared an "Ultra-Capitalist Free-For-All," and gave away nearly all its inventory free of charge, including a PlayStation 5, bottles of wine, and even a live fish (don't ask).

After about three weeks, the experiment ended with the machine more than $1,000 in debt, highlighting how autonomous AI agents are by design open to social engineering and how LLMs cannot manage real-world business tasks independently.

In case of doubt, I'll link again my cheat sheet to deal with AI autonomy and 'safety':

🔳 The 5 perimeters of “AI”
Instead of projecting the future of AI and speculating on how fast the technological roadmap will advance (or stall), it’s far more effective to focus on understanding the specific perimeters in which AI can reliably operate today.