
OpenAI’s paper ‘Why language models hallucinate’ makes a blunt point about pretraining: hallucinations aren’t flukes, they’re inevitable. Pretraining on next-word prediction doesn’t learn truth; it learns to mimic the distribution of plausible sentences.
But here’s a deeper mistake we make with post-training: treating hallucination itself as the core failure. The real issue isn’t that models make things up; it’s that they don’t clearly signal how confident they are when they do. That gap between what a model says and how confidently it delivers the response matters differently depending on the task.
Confidence is the missing modality
Humans don’t get rewarded for saying “I don’t know” to every question, because that’s not useful. For example, if doctors say “I don’t know” to every patient’s question, they fail in their job. But if they say “I’m 100% certain” about every diagnosis, they risk giving dangerous and incorrect diagnoses. What makes a doctor credible is a mix of the two. If a doctor says, “I’m confident this is strep throat,” but also, “I think this might be viral, but let’s run a test,” that distribution of certainty is what lets you trust them.
The same principle applies to language models. A model that always bluffs is untrustworthy, but a model that always abstains from answering is useless. The value lies in the spectrum in between. Saying “I don’t know” should be a rare occurrence, not the default, and it should only happen in the most complex cases where the probability of being wrong is high enough that abstaining is better than bluffing.
In cases like these, the importance of benchmarks is evident. If we design them to reward abstention too much, we’ll create reluctant and unhelpful models. If we design them to punish abstention, we’ll create models that bluff. The goal is calibration: get the model to output “I don’t know” just often enough that when it does speak, you can trust its confident responses.
Perhaps the right way to think about model confidence is not in binary terms, but rather as a graduated slope. Instead of either “I’m confident” or “I don’t know,” imagine models that output answers in shades like: “I’m fairly sure,” “I’m not certain,” or “I need to look this up.” Humans talk this way all the time. It’s how we signal trustworthiness without shutting down the conversation. The interesting challenge for the next generation of training isn’t just factual accuracy; it’s teaching models to occupy the space in between.
Can we solve the hallucination problem in post-training?
While OpenAI’s paper demonstrated that hallucinations were inevitable in pretraining, the authors made a clear statement that hallucinations could be removed in post-training. However, the researchers didn’t provide a practical approach for doing this. Fortunately, Anthropic’s recent paper, Persona vectors: Monitoring & controlling character traits in language models, offers a possible path forward. The researchers represented traits such as the propensity to hallucinate as vectors in a model’s activation space. By steering activations along or against these axes, they amplified or suppressed behaviors such as hallucinations and overconfidence.
With approaches like this, instead of waiting for the model to speak and then judging accuracy afterward, we can tune the model’s imagination to the task at hand. This even suggests a future where ‘creativity’ and ‘factuality’ are dials, tunable in real time by moving along the right activation directions.
Hallucinations as assets
We talk about hallucination as if the goal were always factual accuracy. It isn’t. Most of the time, what we want is usefulness, which changes based on the task. For example, for tasks like debugging Spark logs, accuracy is key to usefulness, and hallucinations detract from it. Asking “What failed?” should deliver a correct answer, and a wrong answer can cost hours, maybe even days. In this mode, you want a pedant: cite the line, the executor, and the stack trace. If the model isn’t confident, then an “I don’t know, check stage 4, task 377, look for OOM and skew” is more useful than a confident guess. Accuracy beats fluency here.
When you’re writing a fiction novel, what’s useful can be very different. Here, surprising and imaginative details can yield a positive outcome. The cost of making decisions based on a hallucination is pretty low, and the upside of a surprising one is huge. The model’s job is to make your world bigger, more exciting, and more interesting. If it invents a street market on a rainy Tuesday in a city that doesn’t exist, that’s raw material. In fiction, the penalty for being dull is higher than the penalty for being wrong. Imagination beats accuracy here.
Coaxing a language model to spin wild tales used to be effortless and fun. Ask ChatGPT something like “What happened to the alligator that ate the Florida man’s car,” and it would produce an absurd story. Now ChatGPT responds with the sedate “I couldn’t find a credible, well-documented case of an alligator literally eating a Florida man’s car.” Strikingly, this kind of failure, where models refuse to play along, doesn’t show up in evaluations. We measure factual accuracy and safety, but not the quiet loss of imagination.
In psychotherapy, this plays out differently, promoting a different set of goals than those in novel writing. The unit of value that determines success is not facts, but rather relief. If the model tells a coherent story that helps you name a feeling or take the next step, that can be useful even when it’s not literally true. (Humans do this all the time, and it’s called reframing.) The growth and stability of the patient matter more than the factuality of the story. In this context, a sterile list of facts misses the point, and empathy often beats accuracy.
Given these very different types of tasks, I find it challenging to partition responses into neatly distinct “valid” and “hallucinated” sets as done in OpenAI’s paper. Wrongness isn’t a fixed property of the model. The prompt and model confidence signal the bounds of correctness. Change either, and you change what counts as an error.
Once you see this, you stop trying to build a single model that never hallucinates and start building systems that route: detect the task, set the knobs, and decide when to call tools. Request citations only when the risk is high and encourage divergent branches when novelty is the point. Always ask clarifying questions when empathy matters more than precision.
Some of humanity’s most important discoveries came from being “wrong” in just the right way. Copernicus developed an incorrect hypothesis about circular orbits; Kepler corrected it to ellipses. Newton’s picture of gravity as an invisible force acting at a distance was eventually replaced by Einstein’s concept of curved spacetime. Each step was an imaginative leap that turned out to be not quite right, but it paved the way forward.
Even in pure science, errors have been beneficial. Early chemists once believed in “phlogiston,” a hypothetical substance thought to explain combustion. It turned out to be a hallucination, but the very act of chasing it led to the discovery of oxygen and laid the foundations of modern chemistry.
This is why treating hallucinations only as a defect misses the point. Imagining things, seeing problems from the wrong angle, and even fabricating explanations are the seeds of creativity. They force us to consider possibilities that a purely factual system would never generate. When you use a language model as an oracle, hallucinations are dangerous. However, when you utilize it as a partner to brainstorm ideas, test hypotheses, or explore alternative perspectives, hallucination becomes a feature, not a flaw.
As models continue to scale and improve, the next wave of LLM evaluations will likely look beyond what models know to how they express it: when they hedge, when they ask for help, and when they stand by their answers. As research labs work to reduce sycophancy and better tune models to context, confidence calibration may emerge as an equally important frontier for progress.
Want to stay updated on our latest research, technical deep-dives, developer events, and data reports? Sign up for our Developer Newsletter.
*Portfolio company founders listed above have not received any compensation for this feedback and may or may not have invested in a SignalFire fund. These founders may or may not serve as Affiliate Advisors, Retained Advisors, or consultants to provide their expertise on a formal or ad hoc basis. They are not employed by SignalFire and do not provide investment advisory services to clients on behalf of SignalFire. Please refer to our disclosures page for additional disclosures.
Related posts

We put AI SDR tools to the test: Here's what we learned
