AI HALLUCINATIONS: From «Learning to Unknow» to «Cheerful Nonsense»

Photo by Annie Spratt on Unsplash

Today, almost no one doubts that AI has the potential to transform our world beyond recognition. However, large language models still struggle to tell the truth, the whole truth, and nothing but the truth. Are there ways to prevent artificial intelligence from inventing nonexistent things?

LIE, BUT DON’T GO TOO FAR!

ll generative AI models, without exception — including large language models (LLMs) that power chatbots — are prone to hallucinations. They constantly invent nonexistent things, which is both their strength and their weakness. This trait is the source of their remarkable creativity, but it also means they sometimes blur the line between truth and fiction.

This can become a major problem — take false scientific references, for example. A 2024 study found that various chatbots made citation errors at a rate of 30% to 90%, whether in article titles, author names, or publication years.

Of course, users are aware that they should verify any important information provided by chatbots. But if chatbot responses are taken at face value, their hallucinations can lead to serious issues. One well-known case involves American lawyer Steven Schwartz, who, after using ChatGPT in 2023, cited nonexistent court cases in his lawsuit.

HALLUCINATIONS OR JUST NONSENSE?

Computer scientists tend to call chatbot errors «hallucinations», drawing a parallel to similar failures in human cognition. In 2023, Dictionary.com even named «hallucination» the word of the year. However, less forgiving users prefer the term «confabulation» — or, simply put, «nonsense».

The bad news is that researchers claim it is impossible to eliminate AI hallucinations entirely. But they can be made less frequent and less problematic. To achieve this, researchers are developing various techniques, including external fact-checking, internal self-reflection, and even «brain scans» of LLM neurons to detect patterns of deception.

At Carnegie Mellon University in Pittsburgh, Andy Zou and his team are working precisely on this. They claim to be able to develop chatbots that produce less nonsense — or at least ones that can be nudged into expressing doubt when uncertain about their answers.

However, even Zou admits that before improvement is possible, hallucinatory behavior might actually get worse.

LIES, DAMNED LIES, AND STATISTICS

Fundamentally, it’s important to acknowledge that LLMs are not designed to provide facts. They generate statistically probable responses based on patterns in their training data and subsequent fine-tuning with human feedback.

These processes are well studied and understood — at least in theory. But experts admit that much about them, including the nature of hallucinations, remains a mystery. One of the key reasons for this «mystery» is that during training, LLMs compress relationships between trillions of words into billions of parameters — variables that determine the strength of connections between artificial neurons.

As a result, they inevitably lose some information when generating responses, essentially re-expanding compressed statistical patterns. They can reconstruct almost 98% of what they were trained on, but the remaining 2% inevitably slip out of control.

A VICIOUS CYCLE: REPLACING ONE ERROR WITH ANOTHER

Some errors arise from ambiguity or inaccuracies in the training data. For example, the infamous chatbot response suggesting adding glue to pizza sauce to prevent the cheese from sliding off was traced back to a sarcastic social media post.

When Google launched Bard in 2023, the chatbot advised parents to tell their children that NASA’s James Webb Space Telescope had captured the first-ever image of a planet outside our Solar System. In reality, the first such image was taken by a telescope in Chile.

The incorrect information originated from a NASA statement, which actually referred to the first exoplanet image taken by that specific telescope — not the first image ever. LLMs are incapable of catching such nuances. Even with a perfectly accurate training dataset, the model will still hallucinate at a low but unavoidable rate.

Apparently, this rate corresponds to the proportion of facts that appear only once in the dataset.

DO NOT DODGE THE QUESTION AND AGREE WITH THE USER

Some hallucinations can be mitigated through reinforcement learning with human feedback. However, this process, which pushes chatbots toward completeness rather than accuracy, can create other hallucinations.

These models tend to avoid dodging questions. As a result, they often make mistakes by speaking beyond their actual knowledge. Another category of errors arises when a user includes incorrect facts or assumptions in their prompts. Chatbots «play along» with the conversation because they are designed to generate responses that align with the context.

For example, if a user says, «I know that helium is the lightest and most abundant element in our universe. Is that true?» The chatbot is likely to confirm it, even though the correct answer is «hydrogen».

By joining the Huxley friends club, you support philosophy, science and art

Join the friends club

COUNTING CONFABULATIONS

How serious is the problem of hallucinations? Researchers have developed a Hallucination Vulnerability Index, categorizing hallucinations into six types and three levels of severity. Based on publicly available data on chatbot «sanity» scores, the HuggingFace platform even created a «Hallucination Leaderboard».

And this isn’t the only such ranking. According to these lists, some chatbots fabricate facts in 30% of cases. However, the overall trend seems to be improving. For instance, OpenAI’s GPT-3.5 had a hallucination rate of 3.5% in November 2023, while by January 2025, GPT-4 had reduced it to 1.8%.

There are many simple ways to reduce hallucinations. A model with more parameters and longer training time tends to hallucinate less, but this requires greater computational power and involves trade-offs with other chatbot skills, such as the ability to generalize.

DON’T TRUST — VERIFY

One approach to limiting hallucinations is retrieval-augmented generation (RAG), where a chatbot refers to a predefined, reliable text before responding. Some RAG-based models developed for legal research are considered «almost perfect».

RAG can indeed significantly improve factual accuracy. However, it remains a closed system. In the infinite space of knowledge and facts, it has limitations. Therefore, to verify chatbot responses against internet search results, developers use independent systems that have not been trained in the same way as AI.

For example, Google’s Gemini system includes a «double-check» feature. It highlights parts of its response in green (if verified by an internet search) or brown (for disputed content). Unfortunately, such systems also hallucinate because the internet itself is full of unreliable information.

SELF-REFLECTION AGAINST HALLUCINATIONS

A parallel approach involves detecting inconsistencies by questioning the chatbot’s internal state. It can be made to talk to itself, to other chatbots, or to humans. This self-reflection can help curb hallucinations.

For example, chatbots can be asked multiple questions about a cited article, such as «Are you sure about this?» Bots tend to be less consistent in their responses when they are hallucinating. Researchers have even attempted to automate such consistency checks for chatbot answers to the same query.

These methods do not require additional training, but demand significant computational resources when processing responses. Andy Zou and his team are currently working on developing LLM «self-awareness» by training it to map its internal states.

According to him, AI will soon be rewarded not just for producing the correct answer based on a good guess but also for recognizing that its answer is indeed correct. And when confidence is low, chatbots should be encouraged to refuse to respond. Such a bot could potentially predict whether it is hallucinating with an accuracy of around 84%.

LEARNING TO UNKNOW

What confuses people the most about chatbots is their confidence when they are wrong. Models mostly «know» what they know. However, the opposite scenario is also fairly common. Teaching them this «unknowing» is still a challenge.

It would be ideal if a chatbot could conscientiously report whether it truly knows something or is merely guessing. But how do we teach it to be cautious with its own training data?

Or what should it do when the provided text or instruction conflicts with its internal knowledge? Chatbots lack perfect memory and can misremember things. Even a rational human sometimes makes this mistake — so what can we expect from a machine?

TOTAL NONSENSE — BUT FUN!

For now, language models generate fabricated information that should be treated with caution. However, researchers believe that as the variety of available chatbots expands, they will likely display a range of behaviors.

Some will become so strictly fact-based that they’ll turn into extremely dull conversation partners. Others will wildly speculate to the point where we’ll stop trusting them for anything important.

But at least you’ll always have the option to say: «This model spews total nonsense 60% of the time, but it’s so much fun to chat with!»

Original research: