This article is an installment of Future Explored, a weekly guide to world-changing technology. You can get stories like this one straight to your inbox every Thursday morning by subscribing here.
A powerful text-to-image AI developed by OpenAI — an AI research lab co-founded by Elon Musk and then-Y Combinator president Sam Altman — has just moved into beta testing.
Those with access to the tool, DALL-E 2, can now describe an image with text and the AI will generate several options. Want an image of an astronaut riding a horse in the style of Andy Warhol? Or a bowl of soup that looks like a monster knitted out of wool? DALL-E 2 can create it.
The beta launch of DALL-E 2 is the latest big move in the growing text-to-image AI space. Here’s more about the industry-leading system, its competitors, and the impact this tech could have on the world of art.
“It feels more like you’re collaborating with a living, breathing thing than just using Photoshop as a tool.”
Karen X. Cheng
Introducing DALL-E 2
OpenAI announced the creation of DALL-E, an AI art tool that could generate realistic images based on a short text prompt, in January 2021. While it wasn’t the first text-to-image AI, it was remarkable and impressive.
DALL-E 2 followed in April 2022 — this new AI could create images with four times the resolution of DALL-E, leading to even more realistic-looking results.
It could also generate variations of an existing image from different angles or in different art styles, and users could replace just part of an existing image with something generated by the AI thanks to the “inpainting” feature.
Images generated when the two systems were given the prompt “a painting of a fox sitting in a field at sunrise in the style of Claude Monet.” Credit: OpenAI
Initially, OpenAI restricted access to DALL-E 2 to “a limited number of trusted users” so that it could learn more about the tech’s abilities and limitations. Many of those with early access were enthralled by the system, posting image after image on their social media accounts.
“I just get myself lost for hours and hours doing it,” Karen X. Cheng, a video director, told Bloomberg. “It feels more like you’re collaborating with a living, breathing thing than just using Photoshop as a tool.”
OpenAI, Karen X. Cheng, and editors from Cosmopolitan magazine collaborated to make this magazine cover using DALL-E 2. Credit: Cosmopolitan
On July 20, OpenAI beta launched DALL-E 2, with the intention of giving one million people on its waiting list access to the AI in the coming weeks.
Each person included in the beta gets 50 free credits up front. With one credit, they can give DALL-E 2 a text prompt and receive four images in return. Alternatively, they can give the system an existing image and receive three variations or edits to it.
After that, they’ll receive 15 free credits monthly, but if they want more, they can buy credits at a cost of $15 for a set of 115. These images can be commercialized, too, meaning users have the right to use them as book illustrations, album covers, t-shirt designs, and… newsletter cover photos.
Limitations and challenges
DALL-E 2 is far from perfect. The AI has trouble incorporating text into an image or generating groups of photorealistic human faces. Anything that would take a knowledge of science to produce (e.g., an anatomically correct human skeleton) is beyond its skillset, too.
“DALL-E doesn’t know what science is,” OpenAI researcher Aditya Ramesh told IEEE Spectrum. “It just knows how to read a caption and draw an illustration, so it tries to make up something that’s visually similar without understanding the meaning.”
One of the images DALL-E 2 produced when asked to generate “an illustration of the solar system, drawn to scale.” Credit: IEEE Spectrum / OpenAI
DALL-E 2’s testing period also revealed a problem with the system that’s annoyingly common amongst AIs: racial and gender bias.
To train the AI, OpenAI had fed it 650 million images from the internet and their captions. Because these data are often biased, so was DALL-E 2 — ask it to draw a CEO or a nurse, and you’d likely get a white man and a white woman, respectively.
On July 18, OpenAI announced that it had updated the system so that, when given a prompt that didn’t specify a particular gender or race, the AI’s output would “more accurately reflect the diversity of the world’s population.”
Examples of DALL-E 2’s response to the “CEO” prompt before OpenAI’s update. Credit: OpenAI
OpenAI claims the fix worked — it says users were 12 times more likely to report DALL-E 2’s images “included people of diverse backgrounds” after its implementation — but it’s not clear the fix actually addressed, or can address, the underlying problem of biased training data.
“The way this rumored implementation works is it adds either male or female or Black, Asian or Caucasian to the prompt randomly,” Max Woolf, a data scientist at BuzzFeed who was granted early access, told NBC News.
“The only way to really fix it is to retrain the entire model on the biased data, and that would not be short term,” he added.
Examples of DALL-E 2’s response to the “CEO” prompt after OpenAI’s update. Credit: OpenAI
Aside from working to address the ongoing issue of bias in training data and output, OpenAI will also need to be diligent about preventing people from using it nefariously, to generate deepfakes or images that spread misinformation, for example.
To address this, it preemptively filtered out violent and graphic images from the datasets used to train DALL-E 2 and prohibits users from asking the AI to generate images that depict famous people, violence, political content, and more.
OpenAI says it plans to use a mix of human moderators and automated systems to identify any of that prohibited content, but staying ahead of the issue will no doubt be a major challenge as the tech’s user base grows.
Images DALL-E 2 generated in response to the prompt “military protest” before (left) and after (right) OpenAI filtered its training dataset. Credit: OpenAI
If you aren’t on DALL-E’s waiting list (or simply don’t want to wait), there are other text-to-image AIs that you can use right now — though the results might not be quite as impressive.
Craiyon: Initially named “DALL-E Mini,” Craiyon is a free-to-use, open-source alternative to OpenAI’s tool, and like DALL-E 2, users are prohibited from using it to create certain types of content (e.g. anything depicting child abuse or support for a terrorist organization).
However, unlike DALL-E 2, free commercial licenses are only available to users with less than $1 million in annual revenue and those who aren’t making money from blockchain transactions (i.e., selling NFTs of the AI-generated art).
Craiyon’s response to the prompt “a dog wearing a t-shirt for a metal band.” Credit: Craiyon
Midjourney: Discord-based text-to-image AI Midjourney is currently in open beta, meaning anyone can sign up and generate 25 free images. To generate more, they’ll need to purchase a monthly subscription, which starts at 200 images for $10 and extends to unlimited images for $30.
“Unlimited” might not quite mean unlimited, though — Midjourney says it reserves the right to limit the number of images customers generate “to prevent quality decay or interruptions to other customers.”
Midjourney has fewer restrictions on the type of content users can create than other text-to-image AI tools, but only those with paid plans can commercialize the images. Companies earning more than $1 million in annual revenue also have to pay $600 for a year-long corporate membership in order to use the images.
Midjourney’s response to the prompt “a dog wearing a t-shirt for a metal band.” Credit: Midjourney
NightCafe Creator: Before even the first DALL-E, there was the free text-to-image AI NightCafe Creator.
The tool is easy to customize — after entering a text prompt, you can choose amongst different styles and algorithms to achieve unique results — and images can be commercialized however users want, even as NFTs.
Users get 5 free credits daily, with each credit equaling one AI-generated image, but they can choose to purchase more for as little as $.047 per credit. They can also earn credits through actions such as liking other users’ images.
NightCafe’s response to the prompt “a dog wearing a t-shirt for a metal band.” Credit: NightCafe
Between the AI-generated lines
The era of AI-generated images appears to be just getting started — in addition to the above, Google and Meta are each developing systems that promise to rival DALL-E 2’s place as the dominant text-to-image AI.
The surge of interest in this type of tech — and users’ ability to commercialize its output — raises questions about AI replacing human artists, many of whom already have a difficult time earning enough money from their art to support themselves.
“If 10,000 people have access to that same model, will I still be able to make something that then somebody will want to buy? … Why buy my art when you can just find something probably rather similar?” Mario Klingemann, a German artist, pondered to Vox.
While it’s true that text-to-image AIs might displace some artists — those who create and sell stock illustrations may be in trouble, for example — others, like Cheng, see the tech as a new tool with which to create.
“The invention of AI art doesn’t mean the death of artists. But artists will have to evolve,” she wrote on Instagram. “The ones who do well will be the ones who find new creative ways to make AI work for them, rather than against them.”
We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].