LLM hallucinations aren’t bugs: The real challenges are confidence and context

OpenAI’s paper ‘Why language models hallucinate’ makes a blunt point about pretraining: hallucinations aren’t flukes, they’re inevitable. Pretraining on next-word prediction doesn’t learn truth; it learns to mimic the distribution of plausible sentences.

But here’s a deeper mistake we make with post-training: treating hallucination itself as the core failure. The real issue isn’t that models make things up; it’s that they don’t clearly signal how confident they are when they do. That gap between what a model says and how confidently it delivers the response matters differently depending on the task.

Confidence is the missing modality

Humans don’t get rewarded for saying “I don’t know” to every question, because that’s not useful. For example, if doctors say “I don’t know” to every patient’s question, they fail in their job. But if they say “I’m 100% certain” about every diagnosis, they risk giving dangerous and incorrect diagnoses. What makes a doctor credible is a mix of the two. If a doctor says, “I’m confident this is strep throat,” but also, “I think this might be viral, but let’s run a test,” that distribution of certainty is what lets you trust them.

The same principle applies to language models. A model that always bluffs is untrustworthy, but a model that always abstains from answering is useless. The value lies in the spectrum in between. Saying “I don’t know” should be a rare occurrence, not the default, and it should only happen in the most complex cases where the probability of being wrong is high enough that abstaining is better than bluffing.

In cases like these, the importance of benchmarks is evident. If we design them to reward abstention too much, we’ll create reluctant and unhelpful models. If we design them to punish abstention, we’ll create models that bluff. The goal is calibration: get the model to output “I don’t know” just often enough that when it does speak, you can trust its confident responses.

Perhaps the right way to think about model confidence is not in binary terms, but rather as a graduated slope. Instead of either “I’m confident” or “I don’t know,” imagine models that output answers in shades like: “I’m fairly sure,” “I’m not certain,” or “I need to look this up.” Humans talk this way all the time. It’s how we signal trustworthiness without shutting down the conversation. The interesting challenge for the next generation of training isn’t just factual accuracy; it’s teaching models to occupy the space in between.

Can we solve the hallucination problem in post-training?

While OpenAI’s paper demonstrated that hallucinations were inevitable in pretraining, the authors made a clear statement that hallucinations could be removed in post-training. However, the researchers didn’t provide a practical approach for doing this. Fortunately, Anthropic’s recent paper, Persona vectors: Monitoring & controlling character traits in language models, offers a possible path forward. The researchers represented traits such as the propensity to hallucinate as vectors in a model’s activation space. By steering activations along or against these axes, they amplified or suppressed behaviors such as hallucinations and overconfidence.

With approaches like this, instead of waiting for the model to speak and then judging accuracy afterward, we can tune the model’s imagination to the task at hand. This even suggests a future where ‘creativity’ and ‘factuality’ are dials, tunable in real time by moving along the right activation directions.

Hallucinations as assets

We talk about hallucination as if the goal were always factual accuracy. It isn’t. Most of the time, what we want is usefulness, which changes based on the task. For example, for tasks like debugging Spark logs, accuracy is key to usefulness, and hallucinations detract from it. Asking “What failed?” should deliver a correct answer, and a wrong answer can cost hours, maybe even days. In this mode, you want a pedant: cite the line, the executor, and the stack trace. If the model isn’t confident, then an “I don’t know, check stage 4, task 377, look for OOM and skew” is more useful than a confident guess. Accuracy beats fluency here.

When you’re writing a fiction novel, what’s useful can be very different. Here, surprising and imaginative details can yield a positive outcome. The cost of making decisions based on a hallucination is pretty low, and the upside of a surprising one is huge. The model’s job is to make your world bigger, more exciting, and more interesting. If it invents a street market on a rainy Tuesday in a city that doesn’t exist, that’s raw material. In fiction, the penalty for being dull is higher than the penalty for being wrong. Imagination beats accuracy here.

Coaxing a language model to spin wild tales used to be effortless and fun. Ask ChatGPT something like “What happened to the alligator that ate the Florida man’s car,” and it would produce an absurd story. Now ChatGPT responds with the sedate “I couldn’t find a credible, well-documented case of an alligator literally eating a Florida man’s car.” Strikingly, this kind of failure, where models refuse to play along, doesn’t show up in evaluations. We measure factual accuracy and safety, but not the quiet loss of imagination.

In psychotherapy, this plays out differently, promoting a different set of goals than those in novel writing. The unit of value that determines success is not facts, but rather relief. If the model tells a coherent story that helps you name a feeling or take the next step, that can be useful even when it’s not literally true. (Humans do this all the time, and it’s called reframing.) The growth and stability of the patient matter more than the factuality of the story. In this context, a sterile list of facts misses the point, and empathy often beats accuracy.

Given these very different types of tasks, I find it challenging to partition responses into neatly distinct “valid” and “hallucinated” sets as done in OpenAI’s paper. Wrongness isn’t a fixed property of the model. The prompt and model confidence signal the bounds of correctness. Change either, and you change what counts as an error.

Once you see this, you stop trying to build a single model that never hallucinates and start building systems that route: detect the task, set the knobs, and decide when to call tools. Request citations only when the risk is high and encourage divergent branches when novelty is the point. Always ask clarifying questions when empathy matters more than precision.

Some of humanity’s most important discoveries came from being “wrong” in just the right way. Copernicus developed an incorrect hypothesis about circular orbits; Kepler corrected it to ellipses. Newton’s picture of gravity as an invisible force acting at a distance was eventually replaced by Einstein’s concept of curved spacetime. Each step was an imaginative leap that turned out to be not quite right, but it paved the way forward.

Even in pure science, errors have been beneficial. Early chemists once believed in “phlogiston,” a hypothetical substance thought to explain combustion. It turned out to be a hallucination, but the very act of chasing it led to the discovery of oxygen and laid the foundations of modern chemistry.

This is why treating hallucinations only as a defect misses the point. Imagining things, seeing problems from the wrong angle, and even fabricating explanations are the seeds of creativity. They force us to consider possibilities that a purely factual system would never generate. When you use a language model as an oracle, hallucinations are dangerous. However, when you utilize it as a partner to brainstorm ideas, test hypotheses, or explore alternative perspectives, hallucination becomes a feature, not a flaw.

As models continue to scale and improve, the next wave of LLM evaluations will likely look beyond what models know to how they express it: when they hedge, when they ask for help, and when they stand by their answers. As research labs work to reduce sycophancy and better tune models to context, confidence calibration may emerge as an equally important frontier for progress.

Want to stay updated on our latest research, technical deep-dives, developer events, and data reports? Sign up for our Developer Newsletter.‎

*Portfolio company founders listed above have not received any compensation for this feedback and may or may not have invested in a SignalFire fund. These founders may or may not serve as Affiliate Advisors, Retained Advisors, or consultants to provide their expertise on a formal or ad hoc basis. They are not employed by SignalFire and do not provide investment advisory services to clients on behalf of SignalFire. Please refer to our disclosures page for additional disclosures.

View all

The missing piece in robotics: A model of the world

AI/ML

Developers

January 22, 2026

The missing piece in robotics: A model of the world

Why robots still struggle in the real world, and how World Models, JEPA, and LeJEPA point to a new path for learning, prediction, and physical intelligence.

The built economy: How vertical AI is unlocking the biggest untapped market in trades and construction

Must-Read

Investment Thesis

AI/ML

December 2, 2025

The built economy: How vertical AI is unlocking the biggest untapped market in trades and construction

Construction, HVAC, plumbing, electrical, and remodeling businesses are being transformed by AI. See how vertical AI cuts admin time, speeds estimating, automates proposals, and modernizes field operations.

Why expert data is becoming the new fuel for AI models

Investment

AI/ML

Investment Thesis

November 4, 2025

Why expert data is becoming the new fuel for AI models

As the open web runs out of usable data, the next frontier of AI training lies in expert knowledge. Learn why the future of AI depends on high-quality, domain-specific data and how startups curating expert workflows are becoming the new power brokers of the AI economy.

Data Reports

Developers

September 9, 2025

SignalFire’s Open Source Superstars ranking: Top 100 contributors

SignalFire’s H1 2025 OSS All Stars Ranking spotlights the 100 most impactful Open-Source engineers in America. See who’s shaping ecosystems, shipping fast, and driving adoption.

AI/ML

Developers

August 18, 2025

A gold at the Math Olympiad isn’t everything - AGI’s moving finish line

An inside look at how LLMs earned gold at the International Math Olympiad and what it reveals about reinforcement learning, inference-time compute, and AGI’s limits.

Talent & Hiring

Data Reports

Developers

August 7, 2025

Code, culture, and competitive edge: Who’s winning the engineering talent game?

In 2025, attracting top engineers is tougher than ever. Our new report uncovers how leading tech companies hire, retain, and grow elite talent using 20 years of data.

Confidence is the missing modality

Can we solve the hallucination problem in post-training?

Hallucinations as assets

Subscribe to our newsletter

Share

Related posts

The missing piece in robotics: A model of the world

The built economy: How vertical AI is unlocking the biggest untapped market in trades and construction

Why expert data is becoming the new fuel for AI models

SignalFire’s Open Source Superstars ranking: Top 100 contributors

A gold at the Math Olympiad isn’t everything - AGI’s moving finish line

Code, culture, and competitive edge: Who’s winning the engineering talent game?

Privacy Settings