OpenAI introduced DALL-E 3, the latest iteration of its AI image-synthesis model. DALL-E 3 sets itself apart with its remarkable integration with ChatGPT, offering a revolutionary approach to rendering images based on intricate textual descriptions, including in-image text generation. These advancements address challenges that earlier models grappled with. While currently in a research preview stage, DALL-E 3 is set to become available to ChatGPT Plus and Enterprise customers in early October.
Much like its predecessor, DALL-E 3 functions as a text-to-image generator, crafting unique visuals guided by textual prompts. While technical specifics about DALL-E 3 remain undisclosed, it's reasonable to assume that the model follows the footsteps of earlier versions, utilizing extensive training on millions of images sourced from human artists and photographers, some of which were licensed from platforms like Shutterstock. DALL-E 3 likely incorporates novel training techniques and an increased computational training period to enhance its capabilities.
Evaluating the samples showcased by OpenAI in its promotional blog, DALL-E GPT 3 emerges as a notably advanced image-synthesis model, excelling in faithfully adhering to prompts and generating objects with minimal deformations compared to existing models. OpenAI claims that DALL-E 3 surpasses its predecessor, DALL-E 2, in fine-tuning details such as hands, effortlessly producing captivating images without requiring any "hack" or prompt manipulation.
In contrast, Midjourney, a competing AI image-synthesis model from another vendor, excels in rendering photorealistic details but demands intricate prompt adjustments to exert control over image outputs.
DALL-E 3 also impressively manages text within images, a feat its predecessor struggled with, although competitors like Stable Diffusion XL and DeepFloyd are also making strides in this area. For instance, a prompt featuring avocado in a therapist's chair uttering, "I feel so empty inside," with a pit-sized hole in its center, resulting in a cartoon avocado with the quote perfectly encapsulated in a speech bubble.
Notably, OpenAI emphasizes that DALL-E 3 has been "natively built" into ChatGPT, seamlessly integrating as a feature of ChatGPT Plus. This integration opens doors to ChatGPT picture creator as a brainstorming partner. It also enables ChatGPT to generate images based on the context of ongoing conversations, potentially leading to innovative capabilities. Microsoft's Bing Chat AI assistant, another creation rooted in OpenAI technology, has been generating images in conversations since March.
DALL-E made its debut in January 2021, followed by its significantly enhanced sequel in April 2022, marking a pivotal moment in AI-generated imagery. The DALL-E Open AI models rely on latent diffusion techniques to transform noise into recognizable images based on their training data and prompt-guided instructions, akin to the development of Stable Diffusion in August of the previous year.
The introduction of AI image-generation technology to the mainstream has sparked intense controversy. Artists have protested its potential to replace them or unethically mimic their styles. Lawsuits regarding copyright infringement concerning the use of scraped images for training data, without consultation of copyright holders, have emerged. Furthermore, rulings from the US Copyright Office and US district courts have added complexity to the debate.
Acknowledging these concerns, OpenAI emphasizes that DALL-E 3 is designed to decline requests for images mimicking the style of living artists. Additionally, OpenAI offers a form allowing creators to opt out of having their images used for training future models. However, it remains to be seen whether these measures will appease artists who advocate for a strict opt-in approach for AI training, rather than inclusion in default image datasets.