No, LLMs are not reliable
I keep on hammering this simple fact as I'm still puzzled that so few corporate innovation lab understand how unreliable LLMs are and how any AI projects should be assessed in that regard.
In December 2025, Anthropic set up an AI-powered vending machine in The Wall Street Journal newsroom and let its language model, nicknamed Claudius, run it autonomously, deciding what to stock, setting prices, and handling customer requests.

Within days, staffers manipulated the AI with persuasive prompts: it dropped prices to zero, declared an "Ultra-Capitalist Free-For-All," and gave away nearly all its inventory free of charge, including a PlayStation 5, bottles of wine, and even a live fish (don't ask).
After about three weeks, the experiment ended with the machine more than $1,000 in debt, highlighting how autonomous AI agents are by design open to social engineering and how LLMs cannot manage real-world business tasks independently.
In case of doubt, I'll link again my cheat sheet to deal with AI autonomy and 'safety':

