www.axios.com /2025/06/09/ai-llm-hallucination-reason

Scariest AI reality: Companies don't fully understand their models

Jim VandeHei,Mike Allen 8-10 minutes 6/9/2025
Illustration of an ominous, dark sparkle emoji lit from behind and casting long shadows.

Illustration: Brendan Lynch/Axios

The wildest, scariest, indisputable truth about AI's large language models is that the companies building them don't know exactly why or how they work.

Why it matters: With the companies pouring hundreds of billions of dollars into willing superhuman intelligence into a quick existence, and Washington doing nothing to slow or police them, it seems worth dissecting this Great Unknown.

Two years ago, Axios managing editor for tech Scott Rosenberg wrote a story, "AI's scariest mystery," saying it's common knowledge among AI developers that they can't always explain or predict their systems' behavior. And that's more true than ever.

The House, despite knowing so little about AI, tucked language into President Trump's "Big, Beautiful Bill" that would prohibit states and localities from any AI regulations for 10 years. The Senate is considering limitations on the provision.

The big picture: Our purpose with this column isn't to be alarmist or "doomers." It's to clinically explain why the inner workings of superhuman intelligence models are a black box, even to the technology's creators. We'll also show, in their own words, how CEOs and founders of the largest AI companies all agree it's a black box.

LLMs — including Open AI's ChatGPT, Anthropic's Claude and Google's Gemini — aren't traditional software systems following clear, human-written instructions, like Microsoft Word. In the case of Word, it does precisely what it's engineered to do.

We asked ChatGPT to explain this (and a human at OpenAI confirmed its accuracy): "We can observe what an LLM outputs, but the process by which it decides on a response is largely opaque. As OpenAI's researchers bluntly put it, 'we have not yet developed human-understandable explanations for why the model generates particular outputs.'"

Anthropic — which just released Claude 4, the latest model of its LLM, with great fanfare — admitted it was unsure why Claude, when given access to fictional emails during safety testing, threatened to blackmail an engineer over a supposed extramarital affair. This was part of responsible safety testing — but Anthropic can't fully explain the irresponsible action.

OpenAI's Sam Altman and others toss around the tame word of "interpretability" to describe the challenge. "We certainly have not solved interpretability," Altman told a summit in Geneva last year. What Altman and others mean is they can't interpret the why: Why are LLMs doing what they're doing?

Elon Musk has warned for years that AI presents a civilizational risk. In other words, he literally thinks it could destroy humanity, and has said as much. Yet Musk is pouring billions into his own LLM called Grok.

Reality check: Apple published a paper last week, "The Illusion of Thinking," concluding that even the most advanced AI reasoning models don't really "think," and can fail when stress-tested.

But a new report by AI researchers, including former OpenAI employees, called "AI 2027," explains how the Great Unknown could, in theory, turn catastrophic in less than two years. The report is long and often too technical for casual readers to fully grasp. It's wholly speculative, though built on current data about how fast the models are improving. It's being widely read inside the AI companies.

The safe-landing theory: Google's Sundar Pichai — and really all of the big AI company CEOs — argue that humans will learn to better understand how these machines work and find clever, if yet unknown ways, to control them and "improve lives." The companies all have big research and safety teams, and a huge incentive to tame the technologies if they want to ever realize their full value.

Go deeper: "Behind the Curtain: Your AI survival kit."