Addressing misconceptions and answering questions about GenAI

An interview with the team at the forefront of our AI efforts.

Sep 30, 2024 13 min read

Anoop Sinha, Research Director, AI & Future Technologies
Stacey King, Director, Knowledge & Information, Trust

A hand holding an orange marker writing on a clear dry erase board

Google’s mission is to organize the world’s information and make it universally accessible and useful. AI can help us deliver on that mission in ways that once seemed impossible, and can benefit us all.

For well over a decade, Google has been developing AI tools and experiencing their transformational power. And now, Google’s newest AI technologies — like Gemini App and Google Search’s AI Overviews — are creating entirely new ways to make information accessible and useful, from language and images to video and audio.

We're working to bring these latest AI advancements to more people, to ensure everyone can make the most of AI’s possibilities - from the everyday to the extraordinary. As we do so, we understand that you may have questions. Like what exactly are LLMs? Or how does generative AI fit into the broader AI ecosystem?

We spoke with Stacey King, Director for K&I Trust, and Anoop Sinha, Research Director for AI & Future Technologies who work at the forefront of Google’s AI endeavors. They provided insight on some of the most common questions asked about Generative AI and LLMs.

An animated illustration that starts with the sentence, “Generative AI is trained to recognize mathematical patterns in language” that transitions to lines connecting a web of dots to demonstrate AI inputs leading to outputs.

Tell us about Generative AI. What is it?

Anoop: Generative AI is a type of machine learning (ML) model that can take what it has learned from the examples it has been provided to create new content, such as text, images, music, and code. Much of the recent progress we’ve seen in AI is based on machine learning (ML), a subfield of computer science where computers learn to mathematically recognize patterns from example data, rather than being programmed with specific rules. And with this recent progress, already the world is beginning to see how AI can help take on tasks that range from the everyday to extraordinary. Whether it’s crunching numbers or curing diseases, AI is a foundational and transformational technology.

Stacey: That’s exactly right, Anoop. I also find it useful to clarify what Generative AI is not. Generative AI models are neither information databases nor deterministic information retrieval systems. Because they are prediction engines, they can create different outputs in response to the same prompt. For example, when you ask a generative AI model to produce an image of a cat, it does not look through its training data and return a cat photo. Instead, it will generate a new image of a cat each time.

Anoop: Great point. I’ll add onto that: Generative AI is not artificial general intelligence (AGI). AGI is a hypothetical type of artificial intelligence (AI) that would have the ability to learn, understand and perform any intellectual task that a human being can. Meanwhile, generative AI is technology already used in a variety of applications — such as image generators used in creative applications – helping to make lives easier, mundane tasks more convenient, and technology more secure. For example, every time your email or document program suggests how you might want to finish a sentence, that’s AI. When you search for photos of your children or pets in Google Photos, or pull up Maps to look up a nearby restaurant or the time a shop closes, that’s AI-powered. When YouTube generates captions for viewers, making videos more accessible, that’s also AI. And that email that didn’t crowd your inbox because it was spam or a virus, that’s AI-powered security at work.

Stacey: That’s a great point, AI-powered products and tools are not new, and they are already helping many people in many ways. But you made a good distinction at the outset, and we should be very clear here: AI models are not sentient. AI’s capabilities are based on identifying patterns and relationships in data and, in so doing, AI can generate outputs that are generally informed by those patterns. This means, at times, an AI model might generate responses that seem to suggest it has opinions or emotions, like love, sadness, or anger, since it has trained on information and data created by people that reflects the human experience and is predicting a likely response.

The world is beginning to see how AI can help take on tasks that range from the everyday to extraordinary. Whether it’s crunching numbers or curing diseases, AI is a foundational and transformational technology.

Anoop Sinha, Research Director, AI & Future Technologies

What is an LLM? How is it different from generative AI?

Stacey: I can kick this one off. So, Large Language Models, or LLMs, are generative AI models which can predict words that are likely to come next, based on the user’s prompt and the text it has generated so far. For example, if prompted to fill in the phrase “cat and [blank],” an LLM might predict that the next word is “mouse,” or it might predict “dog.”

Anoop: Great, I’ll follow your lead, Stacey, and talk about what LLMs are not. Importantly, LLMs are not databases or information retrieval systems. When prompted for facts, they can generate articulate responses that may give the impression that they are retrieving information. However, they do not inherently understand the words they are generating, the concepts they represent, or their accuracy, which is why they can sometimes produce answers that, while sounding plausible, contain factual errors.

[LLMs] do not inherently understand the words they are generating, the concepts they represent, or their accuracy, which is why they can sometimes produce answers that, while sounding plausible, contain factual errors.

Anoop Sinha, Research Director, AI & Future Technologies

An animated illustration of four rows of dots oscillating back and forth, with each row labeled as “Users,” “Use Cases,” “Fine tuned models,” and “Foundational models” to depict different ways AI models are trained on data sets.

How does an LLM learn?

Stacey: Well, many things happen before an LLM is made available to the public. LLMs go through a process of Learning, Pre-Training, Fine-tuning, and, when applicable, Grounding. These machine learning models learn through observation and pattern matching, also known as training. The technical process of “learning” for LLMs begins with training the model to identify relationships and patterns among words in a large dataset. Through this process, a generative AI model will learn “parameters,” which represent the mathematical relationships in data. Once the model has learned these parameters, it can then use them to generate new outputs based on these parameters.

Anoop: Right, and going back to those processes you mentioned, I want to highlight that LLMs are trained in multiple stages to prepare them for use. Pre-training is a way of training an ML model on a variety of data. This gives the model a head start when it is later trained on a smaller dataset of labeled data for a specific task. Following pre-training, more data can be added to an existing LLM through a process called fine-tuning.

Fine-tuning an LLM is the process of adapting a pre-trained LLM to improve its performance on a specific task. The model learns from additional example data to help hone its capabilities. For example, a general purpose language model can learn to summarize technical reports in general by using just a few examples of technical reports and accurate summaries.

Grounding a model refers to the process of linking the abstract concepts in the model to real world entities. Developers use a variety of techniques for grounding generative AI models, including training with real-world data, simulating interactive environments, or even using equipment that can provide actual sensory input. Grounding an LLM can help equip it to better understand language and other abstract concepts in the context of the real world, which may be helpful for tasks such as natural language processing or improving the factuality of model responses. For example, if a model is trained on soccer data accurate through June 2022, it would not be able to provide an accurate response to the question “Who won the 2022 World Cup in December 2022?” as it has no information on the tournament’s results. In this case, grounding the model with techniques for using recent data, while not foolproof, aids LLMs in providing a better answer.

Stacey: Thanks Anoop, I also think it’s important to clarify what we mean when we say data. As you mentioned, LLMs are trained on a variety of data, and they learn through observation and pattern matching. Naturally, the value and quality of individual documents within a given dataset may vary widely. The quality of models’ predictions and outputs benefit from having access to larger pools of data. The amount of data needed for training generally includes millions or billions of data points. In terms of scale, pre-training for text models usually involves hundreds of billions of words, while pre-training for image models may train on hundreds of millions of images or more. Fine-tuning, meanwhile, requires a smaller dataset. For example, fine-tuning for text LLMs might involve hundreds of thousands or millions of examples.

How do you mitigate risks and prevent misuse?

Anoop: Good practices include filtering various training data to reduce the inclusion of harmful content or personal data wherever possible before training, which reduces the chance the model will respond with toxic speech or personal information. Another good practice is to add additional steps such as fine-tuning, classifiers, and guardrails to help the model avoid responding with harmful patterns. For example, before Gemini App launched, thousands of Trusted Testers were invited to use it and give feedback on their experience. This feedback helped improve the overall experience before public launch. But, as with anything, we are continuously trying to improve as we go and discover new patterns.

Stacey: We know that discovering these patterns and building on our knowledge is of paramount importance because, even after filtering, a generative AI model might produce responses that reflect gaps, biases or stereotypes, as it tries to predict a plausible response for a number of reasons. For example, a model is more likely to generate low-quality or inaccurate information if its training data includes an insufficient amount of reliable information or examples. In addition, biases or stereotypes in training data — if not addressed responsibly during the development process — might be reflected in the model’s responses. That being said, there are a number of methods we use to reduce bias:

One way is to continue improving the model via fine-tuning, as issues are flagged and reported. Another mitigation measure is to train generative AI models on data that represents a more balanced view of the world.
Yet another method is to train the generative AI model to represent a wide range of viewpoints for subjective topics, without endorsing one or another.

There are a number of methods we use to reduce bias, one way is to continue improving the model via fine-tuning, as issues are flagged and reported. Another mitigation measure is to train generative AI models on data that represents a more balanced view of the world.

Stacey King, Director for Trust

Anoop: Those are all indeed critical parts of the training process, and people may wonder why a model makes mistakes even after all these mitigants are built into development. Let me try and break that down: we call these mistakes hallucinations. A hallucination is a response from an LLM that may be coherent and presented confidently but is not based in factuality. Among other reasons, hallucinations can occur if that response is not grounded in its training data or real-world information. Hallucinations can be reduced, but are very difficult to eliminate altogether. As Stacey and I discussed, generative models do not retrieve information, but predict which words will come next based on user inputs. For this reason, there is not a full guarantee that the LLM’s prediction will contain factual information – nor that their outputs to a given prompt will remain stable over time.

I’ll elucidate with an example. If you ask an LLM-based interface to give information about a person who is not well known, it might reply that the person has a degree in a field they never studied, from a university that they never attended. This can occur largely because the model is predicting an output about something it does not have enough training data to learn from. When there’s limited or no information about the person, it is more likely the model will hallucinate in its response.

This is why users may see disclaimers when engaging with LLMs, alerting them to the risk of relying on the output of these systems without verifying the responses underlying accuracy. This issue has gotten better as LLMs have learned to use current information and other information sources, AI Overviews is a good example of this. While it’s true that LLMs do not retrieve information, AI Overviews, a newer Search offering, offer a preview of a topic or query based on a variety of sources. These AI-generated snapshots show links to resources that support the information in the snapshot, and explore the topic further. This allows people to dig deeper and discover a diverse range of content from publishers, creators, retailers, businesses, and more, and use the information they find to advance their tasks. Google's systems automatically determine which links appear.

Stacey: To directly answer that question of whether hallucinations can be prevented entirely—the truth is, hallucinations can be reduced in an LLM, but inaccuracies cannot be 100% prevented since responses are created via a prediction mechanism. During fine-tuning, models can be optimized for recognizing correct patterns in their training data, which will reduce the number of factual mistakes. Another technique for reducing hallucinations is to connect LLMs to other systems to provide verified information in the response.

For example, if a user requests a mathematical calculation from an LLM that is connected to a program, the LLM can pass part of the request to that program to perform the task. The LLM then returns the program’s response to the user in its answer.

Anoop: With questions about hallucinations often come questions about attribution of sources in an output. To reiterate, Generative AI models are designed to generate original outputs based on their underlying prediction mechanisms. For example, when it runs, a generative image model creates a new, unique image based on concepts it has picked up across its training data. This makes it difficult for generative models to attribute specific parts of their responses to single sources, though citations are part of training data now and are often linkable to LLM responses (for example in products like AI Overviews). A good analogy might be an artist studying multiple other artist’s styles and then creating their own.

Stacey: Google plans to continue our innovation at the forefront of AI. With new and existing products taking advantage of this technology, we aim to keep making improvements and doing so safely. For example, with AI Overviews we are bringing together our advanced Generative AI capabilities with our best-in-class Search systems to help you find answers to increasingly complex questions. Rather than breaking your question into multiple searches, you can ask your most complex questions, with all the nuances and caveats you have in mind, all in one go.

Google plans to continue our innovation at the forefront of AI. With new and existing products taking advantage of this technology, we aim to keep making improvements and doing so safely.