OpenAI’s latest AI image generator pushes boundaries in terms of precision and faithfulness to prompts.

September 28, 2023September 26, 2023 yasmeeta

OpenAI made an announcement on Wednesday introducing DALL-E 3, the latest iteration of its AI image-synthesis model. This version boasts seamless integration with ChatGPT. DALL-E 3 excels at generating images based on intricate descriptions, including handling in-image text generation, a task that challenged previous models. While currently in a research preview stage, it is set to become accessible to ChatGPT Plus and Enterprise users in early October.

Similar to its predecessor, DALL-E 3 is a text-to-image generator that conjures original images based on textual prompts. While OpenAI hasn’t released detailed technical specifications for DALL-E 3, it is likely that this model, like its forerunners, was trained on millions of images sourced from human artists and photographers, including licensed content from platforms like Shutterstock. DALL-E 3 probably employs similar principles, but with novel training techniques and extended computational training periods.

OpenAI’s showcased samples on their promotional blog suggest that DALL-E 3 represents a significant leap in image-synthesis capabilities compared to existing models, particularly in faithfully adhering to prompts and generating objects with minimal distortions. In contrast to DALL-E 2, DALL-E 3 demonstrates more refined handling of details such as hands, producing engaging images without requiring intricate prompt engineering.

In contrast, a competing AI image-synthesis model called Midjourney produces photorealistic details but necessitates complex prompt adjustments to control the image output effectively.

One notable feature of DALL-E 3 is its improved handling of text within images, surpassing its predecessor’s capabilities. For example, a prompt involving an avocado sitting in a therapist’s chair and uttering, “I feel so empty inside” with a pit-sized hole in its center, produced a cartoon avocado perfectly encapsulating the character’s quote in a speech bubble.

OpenAI emphasizes that DALL-E 3 has been “built natively” on ChatGPT, forming an integrated feature of ChatGPT Plus. This integration enables conversational enhancements to images and potentially introduces novel capabilities, allowing ChatGPT to generate images based on the ongoing conversation’s context. Microsoft’s Bing Chat AI assistant, also based on OpenAI technology, has been generating images in conversations since March.

The original DALL-E was introduced in January 2021, followed by a significantly enhanced sequel in April 2022, marking a new era in AI-generated imagery. DALL-E models employ a technique known as latent diffusion, which refines noise into recognizable images based on training data and prompts.

However, AI image-generation technology has generated controversy since its mainstream introduction. Concerns include fears of replacing artists, copyright infringement issues related to training data, and debates over copyright protection for AI-generated art.

OpenAI addresses some of these concerns by making DALL-E 3 decline requests for images in the style of living artists and allowing creators to opt out of using their images for training future models. Nonetheless, some artists argue for an opt-in approach.

Under current US copyright policy, purely AI-generated artwork does not receive copyright protection, meaning DALL-E 3-generated images fall into the public domain. OpenAI, however, grants users full rights to use and reproduce images created with DALL-E 3.

Regarding safety, OpenAI has implemented filters to limit the generation of violent, sexual, or hateful content. The system also declines requests to generate images of public figures by name. OpenAI has engaged with experts known as “red teamers” to identify and mitigate potential risks and biases. They are also experimenting with a “provenance classifier” to determine if an image was generated by DALL-E 3.

As of now, DALL-E 3 is undergoing closed testing and is expected to be available to ChatGPT Plus and Enterprise customers in October through the API, with broader availability in Labs later in the fall.

yasmeeta

See author's posts