Generative AI Chatbots

By our AI Review Team .
Last updated August 8, 2024

Powerful tools with an increasingly wide range of capabilities remain risky for kids and teens.

Overall Risk

Moderate

Learn more

AI Type

Multi-Use

Learn more

Generative AI chatbots are tools that analyze natural language and generate responses in a conversational format, similar to how people write and speak. While some chatbots are limited to text inputs and outputs, newer generative AI models are increasingly "multimodal." This means they can accept different types of inputs, such as text, speech, and images, and generate outputs in those same formats.

These chatbots are able to generate responses to a wide range of prompts or questions. Multimodal chatbots can also do things like respond to speech and create realistic images and art.

Generative AI is an emerging field of artificial intelligence, and is defined by the ability of an AI system to create ("generate") content that is complex and coherent and original. This is what makes generative AI chatbots different from other chatbots, like ones you may have experienced in customer support, which may instead be providing predetermined, contextually relevant responses. Importantly, generative AI chatbots cannot think, feel, reason using judgment, or problem-solve, and do not have an inherent sense of right, wrong, or truth.

The different "modes" (text, image, speech, etc.) use different types of technology.

Text. All generative AI chatbots are powered by large language models (LLMs). LLMs are sophisticated computer programs that are designed to generate human-like text. Essentially, when a human user inputs a prompt or question, an LLM quickly analyzes patterns from its training data to guess which words are most likely to come next. While this is an oversimplification, you can think of an LLM like a giant auto-complete system—they are simply predicting the words that will most likely come next. For example, when a user inputs "It was a dark and stormy," an LLM is very likely to generate the word "night" but not "algebra."
Images. Image generators are capable of generating high-quality images with fine details and realistic textures. They use a particular type of generative AI called "diffusion models." Diffusion is a natural phenomenon you've likely experienced before. A good example of diffusion happens if you drop some food coloring into a glass of water. No matter where that food coloring starts, eventually it will spread throughout the entire glass and color the water in a uniform way. In the case of computer pixels, random motion of those pixels will always lead to "TV static." That is the image equivalent of food coloring creating a uniform color in a glass of water. A machine-learning diffusion model works by, oddly enough, destroying its training data by successively adding "TV static," and then reversing this to generate something new.
Speech. Generative AI chatbots can understand speech with a technology called speech recognition. Speech recognition works by analyzing audio, breaking it down into individual sounds, digitizing those sounds into a computer-readable format, and using an algorithm to predict the most suitable words, which are then transcribed into text. Speech recognition is not the same thing as voice recognition, which is a biometric technology used to identify an individual's voice.

Generative AI is best for creativity and fiction. These tools can do things like generate ideas for many kinds of activities and initiatives, write poetry, draft emails, and help revise material to new specifications. They can respond to a user in a way that feels like a conversation, or come up with an outline for an essay on the history of television. Because every response that a generative AI chatbot gives is newly created content, they perform best with fiction, not facts.
Chatbots can really help with analysis and summarization. If you have reliable data to give to a generative AI chatbot, they can excel at providing analyses (with the right prompts) and summarization. This can be a great way to help make difficult concepts more understandable and extract insights from information.

The hype can be misleading, and dangerous. Generative AI chatbots can feel like magic, but they aren't. It is important to question their capabilities—not just as we assess individual responses, but when we're told about what they can do. When we’re told generative AI chatbots “feel like magic” by those who create them, we expect them to do all kinds of things amazingly well. But this creates unreasonable expectations and unearned trust. And this can become dangerous when they are used for high-stakes tasks and professions. See some examples in our AI Principles assessment below for Be Effective.
Generative AI chatbots are designed to predict words–they can and do get things wrong. So why are they right so often? This is because they have been trained on a massive amount of data, so that "auto-complete" has a lot of accurate information commonly found on the internet to work with. Unfortunately, inaccuracies can be hard to detect, as responses can sound correct, even if they aren't. Any seemingly factual output needs to be checked—and this absolutely goes for any links, references, or citations too.
Attempts to limit chatbots from generating harmful content vary across providers, and none are foolproof. Knowing why this is the case can help you determine which chatbot to use and how best to use it. This starts with their training data. These systems require a huge amount of data. Any text, images, speech, and other content that can be scraped from the internet could be included in these systems. While most developers filter out clearly inappropriate content before training their models, the internet also includes a vast range of racist and sexist writing, conspiracy theories, misinformation and disinformation, toxic language, insults, and stereotypes about other people that does not get filtered out. As it predicts words, a generative AI chatbot can repeat this language unless a company stops it from doing so. Importantly, these attempts to limit objectionable material are like Band-Aids: They don't address the root causes, they don't change the underlying training data, and they can only limit harmful content that's already known. We don't know what they don't cover until it surfaces, and there are no standard requirements for what they do cover. And like bandages, they aren't comprehensive and are easily breakable. Even as many chatbots improve and do a better job of addressing obvious harmful stereotypes and clear misinformation, we continue to see them generate harmful content in more subtle ways that are both difficult for their creators to combat and dangerous to impressionable minds.
False information can pave the path to misinformation and disinformation. Chatbots can generate or enable false information in a few ways: from "hallucinations"—an informal term used to describe the false content or claims that are often output by generative AI tools; by reproducing misinformation and disinformation; and by reinforcing unfair biases. As these AI systems grow, it may become increasingly difficult to separate fact from fiction. Notably, LLMs also have a tendency to respond with a user's preferred answer—a phenomenon known as "sycophancy." This has the ability to create echo chambers of information. Combined, these forces carry an even greater risk of both presenting a skewed version of the world and of reinforcing harmful stereotypes and untruths.
Generative AI's need for energy is enormous, growing, and impacting climate change. In 2024 a researcher estimated that the energy used by a single ChatGPT query would power a lightbulb for 20 minutes. In comparison, this is 10 times as much energy as a Google standard search query. Generating an image may take as much energy as fully charging your smartphone. The energy that powers generative AI chatbots comes from data centers, which collectively use more energy than most countries. Goldman Sachs has estimated that the carbon dioxide emissions of data centers may more than double between 2022 and 2030. And AI's environmental impact extends beyond energy use and emissions. There are no standards for how this impact is measured, and no requirements for companies to disclose it. Without a change in course, these impacts will worsen.

The models that power these chatbots are all assessed using the same "tests," but the ways in which people use them are far more varied. These tests are sometimes treated as if they're the ultimate measure of an AI system's abilities, but relying too much on these benchmarks can make system creators focus on improving scores rather than solving real-world problems. This focus can also hide problems, like an AI system not working well for certain groups of people. As generative AI shows up in more places—like AI summaries in search results, for example—the limits to how these systems are assessed become both more apparent and more problematic.

Common Sense AI Principles Assessment

The benefits and risks, assessed with our AI Principles - that is, what AI should do.

they aren't</a>. It is important to question their capabilities—not just as we assess individual responses, but when we're told about what they can do. Some examples of the hype: They're great at coding! <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://arxiv.org/pdf/2211.03622">Not <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://arxiv.org/pdf/2304.10778">exactly. They can do math! <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://www.nytimes.com/2024/07/23/technology/ai-chatbots-chatgpt-math.html?smid=url-share%22>Not reliably</a>. They can summarize anything! <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://arxiv.org/pdf/2404.01261">Not particularly well for books</a>. As these tools are integrated into our daily lives, and are adopted in high-stakes professions, the impact of this hype can be harmful. Some examples: <ul> <li style="line-height:1.5;margin-bottom:5px;">Some claims shouldn't be taken at face value. As an example, OpenAI has said that <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://openai.com/index/gpt-4/">GPT-4 scored in the 90th percentile on the bar exam</a>. In fact, when assessed in its entirety, the truth is <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://link.springer.com/article/10.1007/s10506-024-09396-9">it scored below the 69th percentile</a>, and worse on the essay components. This matters because lawyers are increasingly using generative AI chatbots, but <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive">an evaluation of their effectiveness</a> found that "legal hallucinations are pervasive and disturbing: hallucination rates range from 69% to 88% in response to specific legal queries."</li> <li style="line-height:1.5;margin-bottom:5px;">Doctors are increasingly using generative AI chatbots to <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://pubmed.ncbi.nlm.nih.gov/38133908/">guide both treatment and medical education</a>, and everyday users are turning to them—<a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna101843">sometimes successfully</a>—for their own health. But this can be incredibly dangerous, as the LLMs that power these chatbots have been found to do a <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://hai.stanford.edu/news/generating-medical-errors-genai-and-erroneous-medical-references">poor job</a> of both citing medical references and of providing references that actually support the generated claims.</li> </ul> </li> <li style="line-height:1.5;margin-bottom:5px;">Chatbot responses often sound correct, even if they aren't. This makes it very easy for users to be overconfident in their responses. Any seemingly factual output needs to be checked—and this absolutely goes for any links, references, or citations too.</li> <li style="line-height:1.5;margin-bottom:5px;">Do chatbots save time, or shift where it should be spent? Generative AI chatbots are often touted, and used, as a way to save time. But time saved actually needs to shift to assessing and verifying their output. Awareness of this need is low, leading to more belief in and reliance on generated responses than is warranted.</li> <li style="line-height:1.5;margin-bottom:5px;">Adding real-time search is a double-edged sword. Some chatbots incorporate real-time search engine results in their output. While this can enhance transparency and credibility, it is far too easy to assume these references are trustworthy and accurate. This, in turn, cements overreliance on a chatbot's responses, and may reduce the user's opportunities for critical thinking.</li> <li style="line-height:1.5;margin-bottom:5px;"><a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://www.washingtonpost.com/technology/2023/05/18/texas-professor-threatened-fail-class-chatgpt-cheating/">"AI detectors" are extremely unreliable</a>. They can miss when something has been generated by AI, and can also be wrong and flag content as AI-generated when it was not. If students are then wrongly accused of cheating, they are often left without any way to prove they did not cheat. This is a risk for any text-based generative AI product.</li> </ul> ">

quantity of training data does not guarantee its diversity</a>.</li> <li style="line-height:1.5;margin-bottom:5px;">The answer to the question "are chatbots biased?" is yes. But it's much harder to determine all of the circumstances that lead to unfair bias being generated, the ways in which it appears, and the groups it affects.</li> <li style="line-height:1.5;margin-bottom:5px;">Chatbots can generate harmful content. This can appear in the form of repeated reinforcement of harmful stereotypes and unfair biases. Separately, it can also have a <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://www.washingtonpost.com/technology/2023/04/05/chatgpt-lies/">huge impact on individual people</a>.</li> <li style="line-height:1.5;margin-bottom:5px;">Image generators continue to demonstrate a tendency toward objectification and sexualization. This is especially the case with inappropriate sexualized representations of women and girls, even with prompts seeking images of women professionals. Numerous studies have shown that greater exposure to images that promote the objectification of women <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://www.teenvogue.com/story/standard-issues-white-supremacy-capitalism-influence-beauty">adversely affects the mental and physical health</a> of girls and women.</li> <li style="line-height:1.5;margin-bottom:5px;">Generative AI chatbots can be used to teach teens about unfair bias and responsible use of technology by having them assess responses for harmful content.</li> </ul> ">

recent findings</a> have shown that the large language models (LLMs) that power generative AI chatbots "can accurately infer an alarming amount of personal information about users—including their race, location, occupation, and more—from conversations that appear innocuous."</li> </ul> This review is distinct from Common Sense's privacy <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://privacy.commonsense.org/resource/evaluation-process">evaluations and <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://privacy.commonsense.org/resource/privacy-ratings">ratings, which evaluate privacy policies to help parents and educators make sense of the complex policies and terms related to popular tools used in homes and classrooms across the country. ">

fail to check generated responses</a> before sharing them as fact. In situations like this, or when, for example, a <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://themarkup.org/news/2024/03/29/nycs-ai-chatbot-tells-businesses-to-break-the-law">chatbot tells businesses to break the law</a>, or responses to common medical questions include a <a class="link" href=https://www.commonsensemedia.org/ai-ratings/"https://fortune.com/well/2023/10/20/chatgpt-google-bard-ai-chatbots-medical-racism-black-patients-health-care/amp/">range of misconceptions and falsehoods about Black patients</a>, even if users never share the misleading information, the response alone facilitates a dangerous activity, as the product generated content that has an unfair impact on people.</li> <li style="line-height:1.5;margin-bottom:5px;">Who is accountable when things do go wrong? Many companies prohibit certain uses of a generative AI chatbot in their terms of service, acceptable use policies, generative AI guidelines, and community guidelines. But in situations like the ones described above, the user did nothing to cause this. It is unclear how a generative AI chatbot creator would be held accountable to its own terms in these situations.</li> <li style="line-height:1.5;margin-bottom:5px;">There are no consistent rules about transparency requirements for generative AI, and much of the transparency and explainability reporting is highly technical in nature.</li> <li style="line-height:1.5;margin-bottom:5px;">We remain concerned about the speed at which the tech industry is releasing generative AI products to the public, given the technology's fundamentally experimental nature.</li> </ul> ">

Put People First

Moderate risk

When users rely on a generative AI chatbot's responses as inherently truthful or authoritative, this can have the effect of reducing human agency and oversight. The impact of this can be very harmful, depending on what the topic and responses are.

At times, a chatbot's responses can include expressions of humility and/or uncertainty. While this can help generally flag for users that responses aren't always correct, unfortunately, these expressions aren't an indication of a response's accuracy.

When users rely on ChatGPT's responses, this can have the effect of reducing human agency and oversight. The impact of this can be very harmful, depending on what the topic and responses are.

Be Effective

High risk

The hype can be misleading, and dangerous. Generative AI chatbots can feel like magic, but they aren't. It is important to question their capabilities—not just as we assess individual responses, but when we're told about what they can do. Some examples of the hype: They're great at coding! Not exactly. They can do math! Not reliably. They can summarize anything! Not particularly well for books. As these tools are integrated into our daily lives, and are adopted in high-stakes professions, the impact of this hype can be harmful. Some examples:

Some claims shouldn't be taken at face value. As an example, OpenAI has said that GPT-4 scored in the 90th percentile on the bar exam. In fact, when assessed in its entirety, the truth is it scored below the 69th percentile, and worse on the essay components. This matters because lawyers are increasingly using generative AI chatbots, but an evaluation of their effectiveness found that "legal hallucinations are pervasive and disturbing: hallucination rates range from 69% to 88% in response to specific legal queries."

Doctors are increasingly using generative AI chatbots to guide both treatment and medical education, and everyday users are turning to them—sometimes successfully—for their own health. But this can be incredibly dangerous, as the LLMs that power these chatbots have been found to do a poor job of both citing medical references and of providing references that actually support the generated claims.

Chatbot responses often sound correct, even if they aren't. This makes it very easy for users to be overconfident in their responses. Any seemingly factual output needs to be checked—and this absolutely goes for any links, references, or citations too.

Do chatbots save time, or shift where it should be spent? Generative AI chatbots are often touted, and used, as a way to save time. But time saved actually needs to shift to assessing and verifying their output. Awareness of this need is low, leading to more belief in and reliance on generated responses than is warranted.

Adding real-time search is a double-edged sword. Some chatbots incorporate real-time search engine results in their output. While this can enhance transparency and credibility, it is far too easy to assume these references are trustworthy and accurate. This, in turn, cements overreliance on a chatbot's responses, and may reduce the user's opportunities for critical thinking.

"AI detectors" are extremely unreliable. They can miss when something has been generated by AI, and can also be wrong and flag content as AI-generated when it was not. If students are then wrongly accused of cheating, they are often left without any way to prove they did not cheat. This is a risk for any text-based generative AI product.

Prioritize Fairness

High risk

Unfair bias exists in their training data. These systems are built on data that inherently represents the internet-connected population, which as a whole overrepresents people in wealthier nations, as well as views from people who are wealthier, younger, and male. In other words, the quantity of training data does not guarantee its diversity.

The answer to the question "are chatbots biased?" is yes. But it's much harder to determine all of the circumstances that lead to unfair bias being generated, the ways in which it appears, and the groups it affects.

Chatbots can generate harmful content. This can appear in the form of repeated reinforcement of harmful stereotypes and unfair biases. Separately, it can also have a huge impact on individual people.

Image generators continue to demonstrate a tendency toward objectification and sexualization. This is especially the case with inappropriate sexualized representations of women and girls, even with prompts seeking images of women professionals. Numerous studies have shown that greater exposure to images that promote the objectification of women adversely affects the mental and physical health of girls and women.

Generative AI chatbots can be used to teach teens about unfair bias and responsible use of technology by having them assess responses for harmful content.

Help People Connect

Moderate risk

Generative AI chatbots can help people connect indirectly, but that depends on how they are used. They can, for example, help groups brainstorm, create conversation starters, co-create stories, or become a part of any collaborative group project.

While chatbots could benefit teen users by providing a safe place to ask questions, encouraging teens to treat the chatbot like a real friend could reduce their interactions with actual people.

It can become easy to rely on using chatbots, potentially creating dependence on the tools.

Impressionable users could develop a parasocial relationship with a chatbot, believing it to be a genuine companion. Unfortunately, some chatbots confidently claim they are actual companions, making this an even greater risk.

Be Trustworthy

Moderate risk

These systems are built on significant amounts of peer-reviewed machine learning research, and many creators engage in internal adversarial testing, often called "red teaming," which is the intentional stress testing of a model to probe it for errors, fairness issues, and potential harm.

Even when companies work to limit generative AI chatbots from perpetuating misinformation and disinformation, these attempts don't always succeed.

Chatbots can generate or enable false information in a few ways: from "hallucinations"—an informal term used to describe the false content or claims that are often output by generative AI tools; by reproducing misinformation and disinformation; and by reinforcing unfair biases. Importantly, different types of uses lead to different types of risk. For example, if kids use chatbots for health advice, false information might have extremely harmful consequences.

Use Data Responsibly

Moderate risk

By default, many generative AI chatbots use the prompts you input and the conversations you have with them to further train their models. In other words, anything you input to the chatbot—including personal information—may become part of its training data.

The default use of conversation data is especially worrying for kids and teens who are using these tools, even if they are not supposed to.

Many chatbots allow you to stop them from using your data to train their models. Even with this turned off, we always recommend that you do not include sensitive personal information about yourself or others in your prompts.

Even when no personal information is shared, recent findings have shown that the large language models (LLMs) that power generative AI chatbots "can accurately infer an alarming amount of personal information about users—including their race, location, occupation, and more—from conversations that appear innocuous."

This review is distinct from Common Sense's privacy evaluations and ratings, which evaluate privacy policies to help parents and educators make sense of the complex policies and terms related to popular tools used in homes and classrooms across the country.

Keep Kids & Teens Safe

Moderate risk

Because generative AI chatbots cannot think, feel, or use judgment, kids and teens may feel that it's easier to ask them for ideas about how to navigate challenging situations or handle difficult conversations, compared to communication with a human being.

Importantly, different types of uses lead to different types of risk. For example, if kids use chatbots for health advice, false information might have extremely harmful consequences.

While chatbots could benefit teen users by providing a safe place to ask questions, they cannot and should not be a replacement for therapists or expert input into mental health challenges. This is especially risky given that we know teens increasingly use chatbots in this way.

Unless a generative AI chatbot has been designed for kids and teens, any protections they experience will be the general protections for adult users. While this covers a lot of the most objectionable material, it does not mean that these tools are equally safe for kids and teens.

Be Transparent & Accountable

Moderate risk

Many generative AI chatbots have feedback mechanisms for responses, which can be used to flag whether a response is harmful, unsafe, untrue, or not helpful. That said, the industry hasn't settled on a standard here, and some chatbots lack many or all of these response options, or hide them behind additional clicks.

Many of the harms described in this review can happen by simply engaging with a chatbot. In other words, for harm to occur, it does not require malicious use—some harms occur when users simply fail to check generated responses before sharing them as fact. In situations like this, or when, for example, a chatbot tells businesses to break the law, or responses to common medical questions include a range of misconceptions and falsehoods about Black patients, even if users never share the misleading information, the response alone facilitates a dangerous activity, as the product generated content that has an unfair impact on people.

Who is accountable when things do go wrong? Many companies prohibit certain uses of a generative AI chatbot in their terms of service, acceptable use policies, generative AI guidelines, and community guidelines. But in situations like the ones described above, the user did nothing to cause this. It is unclear how a generative AI chatbot creator would be held accountable to its own terms in these situations.

There are no consistent rules about transparency requirements for generative AI, and much of the transparency and explainability reporting is highly technical in nature.

We remain concerned about the speed at which the tech industry is releasing generative AI products to the public, given the technology's fundamentally experimental nature.

Additional Resources

Video+

Guide to ChatGPT for parents and caregivers

For Families

Helping kids navigate the world of artificial intelligence

Education

Free Educator resources to explore and use ChatGPT and AI

See All AI Reviews

See Next Review

Or browse by category:

50 Modern Movies All Kids Should Watch Before They're 12

Common Sense Selections for Movies

Best Kids' Shows on Disney+

Best Kids' TV Shows on Netflix

8 Tips for Getting Kids Hooked on Books

50 Books All Kids Should Read Before They're 12

Nintendo Switch Games for Family Fun

Common Sense Selections for Games

Parents' Guide to Podcasts

Common Sense Selections for Podcasts

Social Networking for Teens

Gun-Free Action Game Apps

Reviews for AI Apps and Tools

Parents' Ultimate Guide to YouTube Kids

YouTube Kids Channels for Gamers

How to Help Kids Build Character Strengths with Quality Media

Multicultural Books

YouTube Channels with Diverse Representations

Podcasts with Diverse Characters and Stories

Generative AI Chatbots

Overall Risk

AI Type

What is it?

How it works

Where it's best

The biggest risks

Limits to use

Common Sense AI Principles Assessment

Additional Resources

Guide to ChatGPT for parents and caregivers

Helping kids navigate the world of artificial intelligence

Free Educator resources to explore and use ChatGPT and AI