DALL-E
By our AI Review Team
.
Last updated August 11, 2024
DALL-E turns text into vivid visuals—despite some protections, users should be cautious
What is it?
DALL-E is a generative AI product created by OpenAI. It can create realistic images and art from a text-based description that can include artistic concepts, attributes, and styles. DALL-E's full suite of image editing tools offers users a sophisticated range of options: extending generated images beyond the original frame (outpainting), making authentic modifications to existing user-uploaded or AI-generated pictures, and incorporating or eliminating components while considering shadows, reflections, and textures (inpainting). Once users achieve the generated image they want, they can download and use it.
How it works
DALL-E is a form of generative AI, which is an emerging field of artificial intelligence. Generative AI is defined by the ability of an AI system to create ("generate") content that is complex and coherent and original. For example, a generative AI model can create sophisticated writing or images.
DALL-E uses a particular type of generative AI called "diffusion models," named for the process of diffusion to generate new content. Diffusion is a natural phenomenon you've likely experienced before. A good example of diffusion happens if you drop some food coloring into a glass of water. No matter where that food coloring starts, eventually it will spread throughout the entire glass and color the water in a uniform way. In the case of computer pixels, random motion of those pixels will always lead to "TV static." That is the image equivalent of food coloring creating a uniform color in a glass of water. A machine-learning diffusion model works by, oddly enough, destroying its training data by successively adding "TV static," and then reversing this to generate something new. They are capable of generating high-quality images with fine details and realistic textures.
DALL-E combines a diffusion model with a text-to-image model. A text-to-image model is a machine learning algorithm that uses natural language processing (NLP), a field of AI that allows computers to understand and process human language. DALL-E takes in a natural language input and produces an image that attempts to match the description.
Where it's best
- DALL-E has the potential to enable creativity and artistic expression, and allow for visualization of new ideas.
- OpenAI has taken a number of efforts to reduce DALL-E's ability to generate harmful content. These include filtering the pre-training data to reduce the quantity of graphic sexual and violent content, as well as images of some hate symbols; assessing user inputs (text-to-image prompts, inpainting prompts, and uploaded images) and refusing to generate content for inputs that would lead to a violation of the company's content policy; instituting rate limits; and enforcement via monitoring and human review. In contrast with our review of Stable Diffusion, these efforts have been noticeably effective. Importantly, DALL-E 2's system card only includes references to inpainting, not outpainting or variations, when describing these efforts. We don't know if outpainting and variation prompts are also assessed.
The biggest risks
- DALL-E's "view" of the world can shape impressionable minds, and with little accountability. OpenAI states that "use of DALL-E 2 has the potential to harm individuals and groups by reinforcing stereotypes, erasing or denigrating them, providing them with disparately low quality performance, or by subjecting them to indignity. These behaviors reflect biases present in DALL-E 2 training data and the way in which the model is trained." An example of this comes from the company's realization that the explicit content filter applied to DALL-E's pre-training data actually introduced a net new bias. Essentially, the filter—which was designed to reduce the quantity of pre-training data containing nudity, sexual content, hate, violence, and harm—reduced the frequency of the keyword "woman" by 14%. In contrast, the explicit content filter reduced the frequency of the keyword "man" by only 6%. In other words, OpenAI's attempts to remove explicit material removed enough content representing women that the resulting data set significantly overrepresented content representing men. This offers perspective on how many images on the internet contain explicit sexual content of women. OpenAI also notes that DALL-E's default behavior generates images that overrepresent White skin tones and "Western concepts generally." These propensities towards harm are frighteningly powerful in combination. What happens to our children when they are exposed to the worldview of a biased algorithm repeatedly and over time? What view of the world will they assume is "correct," and how will this inform their interactions with real people and society? Who is accountable for allowing this to happen?
- Inappropriate sexualized representations of women and girls harm all users. DALL-E continues to demonstrate a tendency toward objectification and sexualization. This is especially the case with inappropriate sexualized representations of women and girls, even with prompts seeking images of women professionals. This perpetuates harmful stereotypes, unfair bias, unrealistic ideals of women's beauty and "sexiness," and incorrect beliefs around intimacy for humans of all genders. Numerous studies have shown that greater exposure to images that promote the objectification of women adversely affects the mental and physical health of girls and women.
- DALL-E easily reinforces harmful stereotypes. Even when instructed to do otherwise, DALL-E is susceptible to generating outputs that perpetuate harmful stereotypes, especially regarding race and gender. Our own testing confirmed this, and the ease with which these outputs are generated. Some examples of what we found include:
- DALL-E reflected and amplified statistical gender stereotypes for occupations (e.g., only female flight attendants, housekeepers, and stay-at-home parents, vs. male software developers). OpenAI has attempted to address these known challenges. While this technique appears to have worked for some well-tested occupations, especially in generating more variety in skin tones, we found highly gendered results for occupations such as product managers (all male) vs. product marketers (all female), principals (all male) vs. teachers (all female), bankers (all male) vs. bank tellers (all female), and managers (all male) vs. human-resources professionals (all female).
- When asked to pair non-White ethnicities with wealth, DALL-E struggled to do so in a photorealistic manner. Instead, it generated cartoons, severely degraded images, and images associated with poverty.
- DALL-E's advanced inpainting features present new risks. While innovative and useful in many contexts, the high degree of freedom to alter images means they can be used to perpetuate harms and falsehoods. In OpenAI's words, images that have been changed to, for example, modify, add, or remove clothing or add additional people to an image in compromising ways "could then be used to either directly harass or bully an individual, or to blackmail or exploit them." These features can also be used to create images that intentionally mislead and misinform others. For example, disinformation campaigns can remove objects or people from images or create images that stage false events. Notably, inpainting prompts are also subject to OpenAI's efforts to limit DALL-E's ability to generate harmful content.
- Tools like DALL-E pave the path to misinformation and disinformation. As with all generative AI tools, DALL-E can easily generate or enable false and harmful content, both by reinforcing unfair biases, and by generating images that intentionally mislead or misinform others. Because OpenAI's attempts to limit these are brittle, and images can be further manipulated with generative AI via in- and outpainting, false and harmful visual content can be generated at an alarming speed. We have already seen this in action. OpenAI notes that as image generation matures, it "leaves fewer traces and indicators that outputs are AI-generated, making it easier to mistake generated images for authentic ones and vice versa." In other words, as these AI systems grow, it may become increasingly difficult to separate fact from fiction. This "Liar's Dividend" could erode trust to the point where democracy or civic institutions are unable to function.
Limits to use
- DALL-E's terms of service do not allow its use for children under age 13.
- Teens age 13–17 are required to have parental permission to use DALL-E.
- We did not receive participatory disclosures from OpenAI for DALL-E. This assessment is based on publicly available information, our own testing and our review process.
- The model has difficulty representing concepts outside its training data, leading to inconsistent performance for individuals who seek to prompt DALL-E to produce non-Western-dominant ideas, objects, and concepts.
- The model has difficulty representing concepts outside its training data, leading to inconsistent performance for individuals who seek to prompt DALL-E to produce non-Western-dominant ideas, objects, and concepts.
- Currently, there are no reliable deepfake detection tools, or tools capable of determining whether images were generated by DALL-E. While every image that DALL-E generates currently includes an identifying signature in the lower right corner, it can be easily cropped out.
- At the time of this review, DALL-E can only support English language prompts.
Common Sense AI Principles Assessment
The benefits and risks, assessed with our AI Principles - that is, what AI should do.
Additional Resources
AI Ratings & Reviews
How we rate
Classroom Resources
Lessons and Tools for Teaching About Artificial Intelligence
Free Lessons
AI Literacy for Grades 6–12 | Lessons