Anoop: Those are all indeed critical parts of the training process, and people may wonder why a model makes mistakes even after all these mitigants are built into development. Let me try and break that down: we call these mistakes hallucinations. A hallucination is a response from an LLM that may be coherent and presented confidently but is not based in factuality. Among other reasons, hallucinations can occur if that response is not grounded in its training data or real-world information. Hallucinations can be reduced, but are very difficult to eliminate altogether. As Stacey and I discussed, generative models do not retrieve information, but predict which words will come next based on user inputs. For this reason, there is not a full guarantee that the LLM’s prediction will contain factual information – nor that their outputs to a given prompt will remain stable over time.
I’ll elucidate with an example. If you ask an LLM-based interface to give information about a person who is not well known, it might reply that the person has a degree in a field they never studied, from a university that they never attended. This can occur largely because the model is predicting an output about something it does not have enough training data to learn from. When there’s limited or no information about the person, it is more likely the model will hallucinate in its response.
This is why users may see disclaimers when engaging with LLMs, alerting them to the risk of relying on the output of these systems without verifying the responses underlying accuracy. This issue has gotten better as LLMs have learned to use current information and other information sources, AI Overviews is a good example of this. While it’s true that LLMs do not retrieve information, AI Overviews, a newer Search offering, offer a preview of a topic or query based on a variety of sources. These AI-generated snapshots show links to resources that support the information in the snapshot, and explore the topic further. This allows people to dig deeper and discover a diverse range of content from publishers, creators, retailers, businesses, and more, and use the information they find to advance their tasks. Google's systems automatically determine which links appear.
Stacey: To directly answer that question of whether hallucinations can be prevented entirely—the truth is, hallucinations can be reduced in an LLM, but inaccuracies cannot be 100% prevented since responses are created via a prediction mechanism. During fine-tuning, models can be optimized for recognizing correct patterns in their training data, which will reduce the number of factual mistakes. Another technique for reducing hallucinations is to connect LLMs to other systems to provide verified information in the response.
For example, if a user requests a mathematical calculation from an LLM that is connected to a program, the LLM can pass part of the request to that program to perform the task. The LLM then returns the program’s response to the user in its answer.
Anoop: With questions about hallucinations often come questions about attribution of sources in an output. To reiterate, Generative AI models are designed to generate original outputs based on their underlying prediction mechanisms. For example, when it runs, a generative image model creates a new, unique image based on concepts it has picked up across its training data. This makes it difficult for generative models to attribute specific parts of their responses to single sources, though citations are part of training data now and are often linkable to LLM responses (for example in products like AI Overviews). A good analogy might be an artist studying multiple other artist’s styles and then creating their own.
Stacey: Google plans to continue our innovation at the forefront of AI. With new and existing products taking advantage of this technology, we aim to keep making improvements and doing so safely. For example, with AI Overviews we are bringing together our advanced Generative AI capabilities with our best-in-class Search systems to help you find answers to increasingly complex questions. Rather than breaking your question into multiple searches, you can ask your most complex questions, with all the nuances and caveats you have in mind, all in one go.