Text-to-Image
Definition
AI systems that generate images from natural language descriptions, typically using diffusion models or transformer-based architectures.
Text-to-image generation exploded into mainstream awareness with DALL-E 2, Midjourney, and Stable Diffusion in 2022. These systems take a text prompt (e.g., "a cat wearing a space suit on Mars, digital art") and generate corresponding images. Most modern systems use diffusion models guided by text encoders like CLIP. Key capabilities include photorealistic image generation, artistic style transfer, inpainting (editing parts of an image), outpainting (extending images), and image-to-image translation. DALL-E 3 and Midjourney v6 produce near-photographic quality. The technology has transformed graphic design, advertising, concept art, and creative workflows while raising important questions about artist copyright, deepfakes, and the authenticity of visual media.
Related Terms
Diffusion Model
A generative model that learns to create data by gradually denoising a random noise signal, reversin...
Generative AI
AI systems that can create new content such as text, images, audio, video, and code, rather than sim...
Text-to-Video
AI systems that generate video content from text descriptions, extending image generation into the t...