What are AI hallucinations?

Elaine Hamill, SAS Insights Editor

The global market for generative artificial intelligence (GenAI) is expected to grow tremendously in the next five years. How can we balance the great promise of AI with the need for safety and responsibility, especially in an election year? Let’s start by understanding AI hallucinations, including why it’s becoming increasingly difficult to tell fact from AI-generated fiction.

Understanding AI hallucinations

In an ideal world, generative AI tools such as Google’s Gemini and OpenAI’s ChatGPT would appropriately address every prompt a user submits. They’d provide answers that are accurate and correct every time.

But in the real world, GenAI sometimes gets it wrong. GenAI occasionally even makes things up. AI hallucinations happen when the large language models (LLMs) that underpin AI chatbots generate nonsensical or false information in response to user prompts.

With more than 5.3 billion people worldwide using the internet, the LLMs that power generative AI are constantly and indiscriminately absorbing data. That includes the billions of videos, photos, emails, social media posts and more that humans create every single day.

Taught on this data-rich mishmash, generative AI can detect information, patterns or objects that don’t exist. These misperceptions produce AI hallucinations – data, content or results – that are false, illogical or otherwise inaccurate.

It sounds strange, but it’s true. AI can and does perceive things that aren’t real. And AI hallucinations run the gamut from comically fake to ever so slightly off base. Some AI hallucinations can even sound or appear convincingly correct or accurate to the untrained eye or ear.

Ripple effects of AI hallucinations

The consequences of AI hallucinations can be significant and wide-ranging. This is especially true when it comes to the rapid spread of misinformation.

Hundreds of millions of people have flocked to generative AI tools since the global introduction to ChatGPT in November 2022. As of April 2024, more than 180 million people were using ChatGPT.

Unfortunately, AI hallucinations are multiplying just as quickly. How often do AI hallucinations occur? Check out Vectara’s Hallucination Leaderboard, which (in April 2024) showed GPT4 Turbo as the least prone to hallucinating, with a 2.5% error rate.

It’s worth noting that ChatGPT now includes a disclaimer just below the open text field: “ChatGPT can make mistakes. Consider checking important information.” And Google’s Gemini recommends that users double-check their responses.

Generative AI: What it is and why it matters

Check out our explainer page to take a deeper dive into GenAI. Read about its history, see how it works, learn about related technologies and models, and discover nuances of how it’s used across industries.

Causes of AI hallucinations

There’s no single or definitive cause of AI hallucinations. Anything from insufficient data to flawed or biased data to the many possible quirks of user prompts can contribute to inaccurate results.

Input bias is a leading cause of AI hallucinations. Not surprisingly, any AI model that’s trained on a biased or flawed data set likely will hallucinate patterns or characteristics that mirror those biases and flaws. Are some AI models purposely manipulated to generate skewed results? Absolutely. Tech companies are catching on and are trying to stay ahead of these nefarious tricksters. But stopping input bias before it starts will remain an ongoing battle.

Overfitting and underfitting also play a part in making models hallucinate. Overfitting happens when a model is too complex – when it learns the details and noise in the training data a bit too well. Complex models work quite well on their training data. But they can’t extrapolate the way they should, so these models perform poorly on new data. They’re just too tailored to their training data to offer reliable predictions based on any other data.

Underfitting is the opposite – it happens when a model is too simple. Models that aren’t sufficiently complex can’t detect details, patterns and relationships in training data. So, the model performs inadequately on both the training data and any new data it absorbs.

Forgive the analogy, but picture an LLM operating like a trash compactor. Just about anything can and does get fed into it, and it’s all compressed repeatedly to make room for more. In the process, the details and finer points of what’s fed into the model are lost.

Thanks to plotlines from popular movies and TV shows, LLMs may appear capable of thought and reason. But they’re not, of course. They’re simply predicting likely answers to the questions they receive based on the data they’ve absorbed. These models can’t provide answers that are fully rooted in reality 100% of the time (at least not yet, anyway).

Finding the optimal level of complexity for any given LLM involves adjusting the number of features, the amount of training the model undergoes and the number of training examples it receives.

Prompt engineering also can affect the output of generative AI. Prompts that are detailed and precise are much more likely to generate accurate answers or results than vague, contradictory or unclear prompts. Giving GenAI specific guidelines or parameters within prompts likewise can help produce better results. So can adding context to queries and even assigning the AI tool a specific role or perspective to help refine its focus.

Impact of AI hallucinations

There are clear implications for AI hallucinations in the real world. Consider these examples from a cross-section of industries and applications:

Because 2024 is an election year, the spread of misinformation by way of AI hallucinations is a hot topic worldwide. Wondering what the frontrunners are doing to safeguard accuracy and transparency? Here’s how OpenAI is approaching worldwide elections in 2024.
In financial services, GenAI can help save money, improve decisions, mitigate risks and enhance customer satisfaction. On the flip side, AI could approve unsuitable applicants for credit or a loan, creating the risk of financial losses for the institution. While we may one day see AI-powered investment advisory services, for now, humans are still very much in the GenAI loop. For example, see how Japan’s Daiwa Securities uses AI to better engage customers and grow their business.
Insurers appreciate the increased speed, efficiency and precision AI offers. But using AI also presents potential ethical and legal issues, such as bias or discrimination in models and algorithms. Find out how AI is reshaping the future of the insurance industry.
Promising applications abound for GenAI in health care and life sciences. But it also can open the door to the rapid-fire proliferation of inaccurate diagnoses, fraudulent medical records and even fake images, including X-rays. Learn more about generative AI in the hands of health care fraudsters.

Success depends heavily on making sure AI models are trained on large, diverse, balanced and high-quality data sets. This helps models minimize bias and generate fairer, more accurate results.

Addressing and mitigating AI hallucinations

Developers of leading AI systems understand that even a slight hallucination rate is unacceptable. What can be done to improve accuracy and prevent AI hallucinations?

Success depends heavily on making sure AI models are trained on large, diverse, balanced and high-quality data sets. This helps models minimize bias and generate fairer, more accurate results. Reducing noise is also important, as incomplete data or anomalies in the data can contribute to AI hallucinations.

Reinforcement learning helps counteract AI hallucinations too. Both machine learning and human feedback can train AI models to make better decisions and produce more accurate results. So, if a chatbot asks you to rate the quality of its response or recommendation, do it! Your feedback will help strengthen the model, and it’s one of the best ways humans can help improve the quality of GenAI output.

Safeguards against AI hallucinations

AI hallucinations and the problems they can cause can be incredibly real – and those problems can affect the real world. LLMs may make the AI world go ‘round, but they must be used responsibly to prevent AI hallucinations from occurring.

As the threat of spreading misinformation looms, efforts to keep GenAI on track and prevent AI hallucinations are ramping up. From Nvidia’s NeMo Guardrails to Guardrails AI and others, validating and verifying how AI models work is becoming big business.

Over time, GenAI models will absorb more data and continually refine their output. Until all AI-generated content is fact-checked, accurate and reliable, however, humans still play a huge role in making sure GenAI lives up to its promise.

Generative AI report: Read the results

We asked decision makers about the challenges and opportunities they faced when implementing generative AI. Learn from their experiences and discover how to identify your best GenAI business cases.